Content

  • 『CRAN Task Views』のカテゴリーに基づいてRパッケージを整理しています。
  • Source: CRAN Task Views https://cran.r-project.org/web/views/ , https://cran.ism.ac.jp/web/views/
  • “Category”, “Package”, “Title” and “Description” are quoted and copied from the above sources./“Category”、 “Package”、 “Title”そして“Description”は上記ソースからコピーしています。
  • 複数のキーワードで検索する際は、半角スペースで区切って下さい
  • 検索には正規表現が利用できます
Category Package Title Description
1 Bayesian Inference abc Tools for Approximate Bayesian Computation (ABC) Implements several ABC algorithms for performing parameter estimation, model selection, and goodness-of-fit. Cross-validation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models.
2 Bayesian Inference abn Modelling Multivariate Data with Additive Bayesian Networks Bayesian network analysis is a form of probabilistic graphical models which derives from empirical data a directed acyclic graph, DAG, describing the dependency structure between random variables. An additive Bayesian network model consists of a form of a DAG where each node comprises a generalized linear model, GLM. Additive Bayesian network models are equivalent to Bayesian multivariate regression using graphical modelling, they generalises the usual multivariable regression, GLM, to multiple dependent variables. ‘abn’ provides routines to help determine optimal Bayesian network models for a given data set, where these models are used to identify statistical dependencies in messy, complex data. The additive formulation of these models is equivalent to multivariate generalised linear modelling (including mixed models with iid random effects). The usual term to describe this model selection process is structure discovery. The core functionality is concerned with model selection - determining the most robust empirical model of data from interdependent variables. Laplace approximations are used to estimate goodness of fit metrics and model parameters, and wrappers are also included to the INLA package which can be obtained from <http://www.r-inla.org>. A comprehensive set of documented case studies, numerical accuracy/quality assurance exercises, and additional documentation are available from the ‘abn’ website <http://r-bayesian-networks.org>.
3 Bayesian Inference AdMit Adaptive Mixture of Student-t Distributions Provides functions to perform the fitting of an adaptive mixture of Student-t distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the Metropolis-Hastings algorithm to obtain quantities of interest for the target density itself.
4 Bayesian Inference arm (core) Data Analysis Using Regression and Multilevel/Hierarchical Models Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.
5 Bayesian Inference AtelieR A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (Central-Limit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).
6 Bayesian Inference BaBooN Bayesian Bootstrap Predictive Mean Matching - Multiple and Single Imputation for Discrete Data Included are two variants of Bayesian Bootstrap Predictive Mean Matching to multiply impute missing data. The first variant is a variable-by-variable imputation combining sequential regression and Predictive Mean Matching (PMM) that has been extended for unordered categorical data. The Bayesian Bootstrap allows for generating approximately proper multiple imputations. The second variant is also based on PMM, but the focus is on imputing several variables at the same time. The suggestion is to use this variant, if the missing-data pattern resembles a data fusion situation, or any other missing-by-design pattern, where several variables have identical missing-data patterns. Both variants can be run as ‘single imputation’ versions, in case the analysis objective is of a purely descriptive nature.
7 Bayesian Inference BACCO (core) Bayesian Analysis of Computer Code Output (BACCO) The BACCO bundle of packages is replaced by the BACCO package, which provides a vignette that illustrates the constituent packages (emulator, approximator, calibrator) in use.
8 Bayesian Inference BaM Functions and Datasets for Books by Jeff Gill Functions and datasets for Jeff Gill: “Bayesian Methods: A Social and Behavioral Sciences Approach”. First, Second, and Third Edition. Published by Chapman and Hall/CRC (2002, 2007, 2014).
9 Bayesian Inference bamlss Bayesian Additive Models for Location, Scale, and Shape (and Beyond) Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2019) <doi:10.1080/10618600.2017.1407325> and the R package in Umlauf, Klein, Simon, Zeileis (2019) <arXiv:1909.11784>.
10 Bayesian Inference BART Bayesian Additive Regression Trees Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09-AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.
11 Bayesian Inference BAS Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the mixture of g-priors from Liang et al (2008) <doi:10.1198/016214507000001337> for linear models or mixtures of g-priors in GLMs of Li and Clyde (2018) <arXiv:1503.06913>. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an efficient MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Uniform priors over all models or beta-binomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used. The user may force variables to always be included. Details behind the sampling algorithm are provided in Clyde, Ghosh and Littman (2010) <doi:10.1198/jcgs.2010.09049>. This material is based upon work supported by the National Science Foundation under Grant DMS-1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
12 Bayesian Inference BayesDA Functions and Datasets for the book “Bayesian Data Analysis” Functions for Bayesian Data Analysis, with datasets from the book “Bayesian data Analysis (second edition)” by Gelman, Carlin, Stern and Rubin. Not all datasets yet, hopefully completed soon.
13 Bayesian Inference BayesFactor Computation of Bayes Factors for Common Designs A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.
14 Bayesian Inference bayesGARCH Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) <doi:10.1007/978-3-540-78657-3>.
15 Bayesian Inference bayesImageS Bayesian Methods for Image Segmentation using a Potts Model Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior of Moores et al. (2015) <doi:10.1016/j.csda.2014.12.001>. Latent labels are sampled using chequerboard updating or Swendsen-Wang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, approximate Bayesian computation (ABC-MCMC and ABC-SMC), and the parametric functional approximate Bayesian (PFAB) algorithm. Refer to <doi:10.1007/s11222-014-9525-6> and <doi:10.1214/18-BA1130> for further details.
16 Bayesian Inference bayesm (core) Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
17 Bayesian Inference bayesmeta Bayesian Random-Effects Meta-Analysis A collection of functions allowing to derive the posterior distribution of the two parameters in a random-effects meta-analysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive p-values, etc.
18 Bayesian Inference bayesmix Bayesian Mixture Models with JAGS The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.
19 Bayesian Inference bayesQR Bayesian Quantile Regression Bayesian quantile regression using the asymmetric Laplace distribution, both continuous as well as binary dependent variables are supported. The package consists of implementations of the methods of Yu & Moyeed (2001) <doi:10.1016/S0167-7152(01)00124-9>, Benoit & Van den Poel (2012) <doi:10.1002/jae.1216> and Al-Hamzawi, Yu & Benoit (2012) <doi:10.1177/1471082X1101200304>. To speed up the calculations, the Markov Chain Monte Carlo core of all algorithms is programmed in Fortran and called from R.
20 Bayesian Inference BayesSummaryStatLM MCMC Sampling of Bayesian Linear Models via Summary Statistics Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function read.regress.data.ff utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks.
21 Bayesian Inference bayesSurv (core) Bayesian Survival Regression with Flexible Error and Random Effects Distributions Contains Bayesian implementations of Mixed-Effects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only right-censored but also interval-censored, doubly-interval-censored or misclassified interval-censored.
22 Bayesian Inference BayesTree Bayesian Additive Regression Trees This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).
23 Bayesian Inference BayesValidate BayesValidate Package BayesValidate implements the software validation method described in the paper “Validation of Software for Bayesian Models using Posterior Quantiles” (Cook, Gelman, and Rubin, 2005). It inputs a function to perform Bayesian inference as well as functions to generate data from the Bayesian model being fit, and repeatedly generates and analyzes data to check that the Bayesian inference program works properly.
24 Bayesian Inference BayesVarSel Bayes Factors, Model Choice and Variable Selection in Linear Models Conceived to calculate Bayes factors in linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in this package is placed on the prior distributions and it allows a wide range of them: Jeffreys (1961); Zellner and Siow(1980)<doi:10.1007/bf02888369>; Zellner and Siow(1984); Zellner (1986)<doi:10.2307/2233941>; Fernandez et al. (2001)<doi:10.1016/s0304-4076(00)00076-2>; Liang et al. (2008)<doi:10.1198/016214507000001337> and Bayarri et al. (2012)<doi:10.1214/12-aos1013>. The interaction with the package is through a friendly interface that syntactically mimics the well-known lm() command of R. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true -data generating- model. Additionally, this package incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions of the main commands, Garcia-Donato and Martinez-Beneito (2013)<doi:10.1080/01621459.2012.742443>.
25 Bayesian Inference BayesX R Utilities Accompanying the Software Package BayesX Functions for exploring and visualising estimation results obtained with BayesX, a free software for estimating structured additive regression models (<http://www.BayesX.org>). In addition, functions that allow to read, write and manipulate map objects that are required in spatial analyses performed with BayesX.
26 Bayesian Inference BayHaz R Functions for Bayesian Hazard Rate Estimation A suite of R functions for Bayesian estimation of smooth hazard rates via Compound Poisson Process (CPP) and Bayesian Penalized Spline (BPS) priors.
27 Bayesian Inference BAYSTAR On Bayesian analysis of Threshold autoregressive model (BAYSTAR) The manuscript introduces the BAYSTAR package, which provides the functionality for Bayesian estimation in autoregressive threshold models.
28 Bayesian Inference bbemkr Bayesian bandwidth estimation for multivariate kernel regression with Gaussian error Bayesian bandwidth estimation for Nadaraya-Watson type multivariate kernel regression with Gaussian error density
29 Bayesian Inference BCBCSF Bias-Corrected Bayesian Classification with Selected Features Fully Bayesian Classification with a subset of high-dimensional features, such as expression levels of genes. The data are modeled with a hierarchical Bayesian models using heavy-tailed t distributions as priors. When a large number of features are available, one may like to select only a subset of features to use, typically those features strongly correlated with the response in training cases. Such a feature selection procedure is however invalid since the relationship between the response and the features has be exaggerated by feature selection. This package provides a way to avoid this bias and yield better-calibrated predictions for future cases when one uses F-statistic to select features.
30 Bayesian Inference BCE Bayesian composition estimator: estimating sample (taxonomic) composition from biomarker data Function to estimates taxonomic compositions from biomarker data, using a Bayesian approach.
31 Bayesian Inference bclust Bayesian Hierarchical Clustering Using Spike and Slab Models Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.
32 Bayesian Inference bcp Bayesian Analysis of Change Point Problems Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.
33 Bayesian Inference BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Mohammadi and Wit (2019) <doi:10.18637/jss.v089.i03>.
34 Bayesian Inference BLR Bayesian Linear Regression Bayesian Linear Regression.
35 Bayesian Inference BMA Bayesian Model Averaging Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).
36 Bayesian Inference Bmix Bayesian Sampling for Stick-Breaking Mixtures This is a bare-bones implementation of sampling algorithms for a variety of Bayesian stick-breaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stick-breaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stick-breaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.
37 Bayesian Inference bmixture Bayesian Estimation for Finite Mixture of Distributions Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s00180-012-0323-3> and Mohammadi and Salehi-Rad (2012) <doi:10.1080/03610918.2011.588358>.
38 Bayesian Inference BMS Bayesian Model Averaging Library Bayesian model averaging for linear models with a wide choice of (customizable) priors. Built-in priors include coefficient priors (fixed, flexible and hyper-g priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Post-processing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.
39 Bayesian Inference bnlearn Bayesian Network Structure Learning, Parameter Learning and Inference Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <http://www.bnlearn.com>.
40 Bayesian Inference BNSP Bayesian Non- And Semi-Parametric Model Fitting MCMC algorithms & processing functions for non- and semi-parametric models: 1. Dirichlet process mixtures & 2. spike-slab for multivariate (and univariate) response analysis, with nonparametric models for the means, the variances and the correlation matrix.
41 Bayesian Inference boa (core) Bayesian Output Analysis Program (BOA) for MCMC A menu-driven program and library of functions for carrying out convergence diagnostics and statistical and graphical analysis of Markov chain Monte Carlo sampling output.
42 Bayesian Inference Bolstad Functions for Elementary Bayesian Inference A set of R functions and data sets for the book Introduction to Bayesian Statistics, Bolstad, W.M. (2017), John Wiley & Sons ISBN 978-1-118-09156-2.
43 Bayesian Inference Boom Bayesian Object Oriented Modeling A C++ library for Bayesian modeling, with an emphasis on Markov chain Monte Carlo. Although boom contains a few R utilities (mainly plotting functions), its primary purpose is to install the BOOM C++ library on your system so that other packages can link against it.
44 Bayesian Inference BoomSpikeSlab MCMC for Spike and Slab Regression Spike and slab regression with a variety of residual error distributions corresponding to Gaussian, Student T, probit, logit, SVM, and a few others. Spike and slab regression is Bayesian regression with prior distributions containing a point mass at zero. The posterior updates the amount of mass on this point, leading to a posterior distribution that is actually sparse, in the sense that if you sample from it many coefficients are actually zeros. Sampling from this posterior distribution is an elegant way to handle Bayesian variable selection and model averaging. See <doi:10.1504/IJMMNO.2014.059942> for an explanation of the Gaussian case.
45 Bayesian Inference bqtl Bayesian QTL Mapping Toolkit QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.
46 Bayesian Inference bridgesampling Bridge Sampling for Marginal Likelihoods and Bayes Factors Provides functions for estimating marginal likelihoods, Bayes factors, posterior model probabilities, and normalizing constants in general, via different versions of bridge sampling (Meng & Wong, 1996, <http://www3.stat.sinica.edu.tw/statistica/j6n4/j6n43/j6n43.htm>).
47 Bayesian Inference brms Bayesian Regression Models using ‘Stan’ Fit Bayesian generalized (non-)linear multivariate multilevel models using ‘Stan’ for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit among others linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation. References: Burkner (2017) <doi:10.18637/jss.v080.i01>; Burkner (2018) <doi:10.32614/RJ-2018-017>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.
48 Bayesian Inference bsamGP Bayesian Spectral Analysis Models using Gaussian Process Priors Contains functions to perform Bayesian inference using a spectral analysis of Gaussian process priors. Gaussian processes are represented with a Fourier series based on cosine basis functions. Currently the package includes parametric linear models, partial linear additive models with/without shape restrictions, generalized linear additive models with/without shape restrictions, and density estimation model. To maximize computational efficiency, the actual Markov chain Monte Carlo sampling for each model is done using codes written in FORTRAN 90. This software has been developed using funding supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (no. NRF-2016R1D1A1B03932178 and no. NRF-2017R1D1A3B03035235).
49 Bayesian Inference bspec Bayesian Spectral Inference Bayesian inference on the (discrete) power spectrum of time series.
50 Bayesian Inference bspmma Bayesian Semiparametric Models for Meta-Analysis The main functions carry out Gibbs’ sampler routines for nonparametric and semiparametric Bayesian models for random effects meta-analysis.
51 Bayesian Inference bsts Bayesian Structural Time Series Time series regression using dynamic linear models fit using MCMC. See Scott and Varian (2014) <doi:10.1504/IJMMNO.2014.059942>, among many other sources.
52 Bayesian Inference BVS Bayesian Variant Selection: Bayesian Model Uncertainty Techniques for Genetic Association Studies The functions in this package focus on analyzing case-control association studies involving a group of genetic variants. In particular, we are interested in modeling the outcome variable as a function of a multivariate genetic profile using Bayesian model uncertainty and variable selection techniques. The package incorporates functions to analyze data sets involving common variants as well as extensions to model rare variants via the Bayesian Risk Index (BRI) as well as haplotypes. Finally, the package also allows the incorporation of external biological information to inform the marginal inclusion probabilities via the iBMU.
53 Bayesian Inference catnet Categorical Bayesian Network Inference Structure learning and parameter estimation of discrete Bayesian networks using likelihood-based criteria. Exhaustive search for fixed node orders and stochastic search of optimal orders via simulated annealing algorithm are implemented.
54 Bayesian Inference coalescentMCMC MCMC Algorithms for the Coalescent Flexible framework for coalescent analyses in R. It includes a main function running the MCMC algorithm, auxiliary functions for tree rearrangement, and some functions to compute population genetic parameters.
55 Bayesian Inference coda (core) Output Analysis and Diagnostics for MCMC Provides functions for summarizing and plotting the output from Markov Chain Monte Carlo (MCMC) simulations, as well as diagnostic tests of convergence to the equilibrium distribution of the Markov chain.
56 Bayesian Inference dclone Data Cloning and MCMC Tools for Maximum Likelihood Methods Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.
57 Bayesian Inference deBInfer Bayesian Inference for Differential Equations A Bayesian framework for parameter inference in differential equations. This approach offers a rigorous methodology for parameter inference as well as modeling the link between unobservable model states and parameters, and observable quantities. Provides templates for the DE model, the observation model and data likelihood, and the model parameters and their prior distributions. A Markov chain Monte Carlo (MCMC) procedure processes these inputs to estimate the posterior distributions of the parameters and any derived quantities, including the model trajectories. Further functionality is provided to facilitate MCMC diagnostics and the visualisation of the posterior distributions of model parameters and trajectories.
58 Bayesian Inference dlm Bayesian and Likelihood Analysis of Dynamic Linear Models Provides routines for Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models.
59 Bayesian Inference EbayesThresh Empirical Bayes Thresholding and Related Methods Empirical Bayes thresholding using the methods developed by I. M. Johnstone and B. W. Silverman. The basic problem is to estimate a mean vector given a vector of observations of the mean vector plus white noise, taking advantage of possible sparsity in the mean vector. Within a Bayesian formulation, the elements of the mean vector are modelled as having, independently, a distribution that is a mixture of an atom of probability at zero and a suitable heavy-tailed distribution. The mixing parameter can be estimated by a marginal maximum likelihood approach. This leads to an adaptive thresholding approach on the original data. Extensions of the basic method, in particular to wavelet thresholding, are also implemented within the package.
60 Bayesian Inference ebdbNet Empirical Bayes Estimation of Dynamic Bayesian Networks Infer the adjacency matrix of a network from time course data using an empirical Bayes estimation procedure based on Dynamic Bayesian Networks.
61 Bayesian Inference eco Ecological Inference in 2x2 Tables Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the Expectation-Maximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individual-level data can be directly incorporated into the estimation whenever such data are available. Along with in-sample and out-of-sample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.
62 Bayesian Inference eigenmodel Semiparametric Factor and Regression Models for Symmetric Relational Data Estimation of the parameters in a model for symmetric relational data (e.g., the above-diagonal part of a square matrix), using a model-based eigenvalue decomposition and regression. Missing data is accommodated, and a posterior mean for missing data is calculated under the assumption that the data are missing at random. The marginal distribution of the relational data can be arbitrary, and is fit with an ordered probit specification. See Hoff (2007) <arXiv:0711.1146> for details on the model.
63 Bayesian Inference ensembleBMA Probabilistic Forecasting using Ensembles and Bayesian Model Averaging Bayesian Model Averaging to create probabilistic forecasts from ensemble forecasts and weather observations.
64 Bayesian Inference EntropyMCMC MCMC Simulation and Convergence Evaluation using Entropy and Kullback-Leibler Divergence Estimation Tools for Markov Chain Monte Carlo (MCMC) simulation and performance analysis. Simulate MCMC algorithms including adaptive MCMC, evaluate their convergence rate, and compare candidate MCMC algorithms for a same target density, based on entropy and Kullback-Leibler divergence criteria. MCMC algorithms can be simulated using provided functions, or imported from external codes. This package is based upon work starting with Chauveau, D. and Vandekerkhove, P. (2013) <doi:10.1051/ps/2012004> and next articles.
65 Bayesian Inference evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
66 Bayesian Inference exactLoglinTest Monte Carlo Exact Tests for Log-linear models Monte Carlo and MCMC goodness of fit tests for log-linear models
67 Bayesian Inference factorQR Bayesian quantile regression factor models Package to fit Bayesian quantile regression models that assume a factor structure for at least part of the design matrix.
68 Bayesian Inference FME A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steady-state solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.
69 Bayesian Inference geoR Analysis of Geostatistical Data Geostatistical analysis including traditional, likelihood-based and Bayesian methods.
70 Bayesian Inference geoRglm A Package for Generalised Linear Spatial Models Functions for inference in generalised linear spatial models. The posterior and predictive inference is based on Markov chain Monte Carlo methods. Package geoRglm is an extension to the package geoR, which must be installed first.
71 Bayesian Inference ggmcmc Tools for Analyzing MCMC Simulations from Bayesian Inference Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables.
72 Bayesian Inference gRain Graphical Independence Networks Probability propagation in graphical independence networks, also known as Bayesian networks or probabilistic expert systems.
73 Bayesian Inference hbsae Hierarchical Bayesian Small Area Estimation Functions to compute small area estimates based on a basic area or unit-level model. The model is fit using restricted maximum likelihood, or in a hierarchical Bayesian way. In the latter case numerical integration is used to average over the posterior density for the between-area variance. The output includes the model fit, small area estimates and corresponding MSEs, as well as some model selection measures. Additional functions provide means to compute aggregate estimates and MSEs, to minimally adjust the small area estimates to benchmarks at a higher aggregation level, and to graphically compare different sets of small area estimates.
74 Bayesian Inference HI Simulation from distributions supported by nested hyperplanes Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.
75 Bayesian Inference Hmisc Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
76 Bayesian Inference iterLap Approximate Probability Densities by Iterated Laplace Approximations The iterLap (iterated Laplace approximation) algorithm approximates a general (possibly non-normalized) probability density on R^p, by repeated Laplace approximations to the difference between current approximation and true density (on log scale). The final approximation is a mixture of multivariate normal distributions and might be used for example as a proposal distribution for importance sampling (eg in Bayesian applications). The algorithm can be seen as a computational generalization of the Laplace approximation suitable for skew or multimodal densities.
77 Bayesian Inference LaplacesDemon Complete Environment for Bayesian Inference Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.
78 Bayesian Inference LearnBayes Functions for Learning Bayesian Inference A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.
79 Bayesian Inference lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
80 Bayesian Inference lmm Linear Mixed Models It implements Expectation/Conditional Maximization Either (ECME) and rapidly converging algorithms as well as Bayesian inference for linear mixed models, which is described in Schafer, J.L. (1998) “Some improved procedures for linear mixed models”. Dept. of Statistics, The Pennsylvania State University.
81 Bayesian Inference MasterBayes ML and MCMC Methods for Pedigree Reconstruction and Analysis The primary aim of MasterBayes is to use MCMC techniques to integrate over uncertainty in pedigree configurations estimated from molecular markers and phenotypic data. Emphasis is put on the marginal distribution of parameters that relate the phenotypic data to the pedigree. All simulation is done in compiled C++ for efficiency.
82 Bayesian Inference matchingMarkets Analysis of Stable Matchings Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
83 Bayesian Inference mcmc (core) Markov Chain Monte Carlo Simulates continuous distributions of random vectors using Markov chain Monte Carlo (MCMC). Users specify the distribution by an R function that evaluates the log unnormalized density. Algorithms are random walk Metropolis algorithm (function metrop), simulated tempering (function temper), and morphometric random walk Metropolis (Johnson and Geyer, 2012, <doi:10.1214/12-AOS1048>, function morph.metrop), which achieves geometric ergodicity by change of variable.
84 Bayesian Inference MCMCglmm MCMC Generalised Linear Mixed Models MCMC Generalised Linear Mixed Models.
85 Bayesian Inference MCMCpack (core) Markov Chain Monte Carlo (MCMC) Package Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
86 Bayesian Inference MCMCvis Tools to Visualize, Manipulate, and Summarize MCMC Output Performs key functions for MCMC analysis using minimal code - visualizes, manipulates, and summarizes MCMC output. Functions support simple and straightforward subsetting of model parameters within the calls, and produce presentable and ‘publication-ready’ output. MCMC output may be derived from Bayesian model output fit with JAGS, Stan, or other MCMC samplers.
87 Bayesian Inference mgcv Mixed GAM Computation Vehicle with Automatic Smoothness Estimation Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.
88 Bayesian Inference mlogitBMA Bayesian Model Averaging for Multinomial Logit Models Provides a modified function bic.glm of the BMA package that can be applied to multinomial logit (MNL) data. The data is converted to binary logit using the Begg & Gray approximation. The package also contains functions for maximum likelihood estimation of MNL.
89 Bayesian Inference MNP R Package for Fitting the Multinomial Probit Model Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. <doi:10.18637/jss.v014.i03>.
90 Bayesian Inference mombf Bayesian Model Selection and Averaging for Non-Local and Local Priors Bayesian model selection and averaging for regression and mixtures for non-local and selected local priors.
91 Bayesian Inference monomvn Estimation for Multivariate Normal and Student-t Data with Monotone Missingness Estimation of multivariate normal and student-t data of arbitrary dimension where the pattern of missing data is monotone. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided.
92 Bayesian Inference NetworkChange Bayesian Package for Network Changepoint Analysis Network changepoint analysis for undirected network data. The package implements a hidden Markov network change point model (Park and Sohn 2019). Functions for break number detection using the approximate marginal likelihood and WAIC are also provided.
93 Bayesian Inference nimble (core) MCMC, Particle Filtering, and Programmable Hierarchical Modeling A system for writing hierarchical statistical models largely compatible with ‘BUGS’ and ‘JAGS’, writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. ‘NIMBLE’ includes default methods for MCMC, particle filtering, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers ‘NIMBLE’ provides. ‘NIMBLE’ extends the ‘BUGS’/‘JAGS’ language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the ‘BUGS’/‘JAGS’ language for writing models, one can use ‘NIMBLE’ for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.
94 Bayesian Inference openEBGM EBGM Disproportionality Scores for Adverse Event Data Mining An implementation of DuMouchel’s (1999) <doi:10.1080/00031305.1999.10474456> Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and quantile scores from the posterior distribution using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the ‘PhViD’ package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Now includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the ‘mederrRank’ package.
95 Bayesian Inference pacbpred PAC-Bayesian Estimation and Prediction in Sparse Additive Models This package is intended to perform estimation and prediction in high-dimensional additive models, using a sparse PAC-Bayesian point of view and a MCMC algorithm. The method is fully described in Guedj and Alquier (2013), ‘PAC-Bayesian Estimation and Prediction in Sparse Additive Models’, Electronic Journal of Statistics, 7, 264291.
96 Bayesian Inference PAWL Implementation of the PAWL algorithm Implementation of the Parallel Adaptive Wang-Landau algorithm. Also implemented for comparison: parallel adaptive Metropolis-Hastings,SMC sampler.
97 Bayesian Inference pcFactorStan Stan Models for the Paired Comparison Factor Model Provides convenience functions and pre-programmed Stan models related to the paired comparison factor model. Its purpose is to make fitting paired comparison data using Stan easy.
98 Bayesian Inference plotMCMC MCMC Diagnostic Plots Markov chain Monte Carlo diagnostic plots. The purpose of the package is to combine existing tools from the ‘coda’ and ‘lattice’ packages, and make it easy to adjust graphical details.
99 Bayesian Inference predmixcor Classification rule based on Bayesian mixture models with feature selection bias corrected “train_predict_mix” predicts the binary response with binary features
100 Bayesian Inference PReMiuM Dirichlet Process Bayesian Clustering, Profile Regression Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.
101 Bayesian Inference prevalence Tools for Prevalence Assessment Studies The prevalence package provides Frequentist and Bayesian methods for prevalence assessment studies. IMPORTANT: the truePrev functions in the prevalence package call on JAGS (Just Another Gibbs Sampler), which therefore has to be available on the user’s system. JAGS can be downloaded from http://mcmc-jags.sourceforge.net/.
102 Bayesian Inference profdpm Profile Dirichlet Process Mixtures This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.
103 Bayesian Inference pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
104 Bayesian Inference R2BayesX Estimate Structured Additive Regression Models with ‘BayesX’ An R interface to estimate structured additive regression (STAR) models with ‘BayesX’.
105 Bayesian Inference R2jags Using R to Run ‘JAGS’ Providing wrapper functions to implement Bayesian analysis in JAGS. Some major features include monitoring convergence of a MCMC model using Rubin and Gelman Rhat statistics, automatically running a MCMC model till it converges, and implementing parallel processing of a MCMC model for multiple chains.
106 Bayesian Inference R2WinBUGS Running ‘WinBUGS’ and ‘OpenBUGS’ from ‘R’ / ‘S-PLUS’ Invoke a ‘BUGS’ model in ‘OpenBUGS’ or ‘WinBUGS’, a class “bugs” for ‘BUGS’ results and functions to work with that class. Function write.model() allows a ‘BUGS’ model file to be written. The class and auxiliary functions could be used with other MCMC programs, including ‘JAGS’.
107 Bayesian Inference ramps Bayesian Geostatistical Modeling with RAMPS Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm designed to lower autocorrelation in MCMC samples. Package performance is tuned for large spatial datasets.
108 Bayesian Inference revdbayes Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.r-project.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the K-gaps model of Suveges and Davison (2010) <doi:10.1214/09-AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.
109 Bayesian Inference RJaCGH Reversible Jump MCMC for the Analysis of CGH Arrays Bayesian analysis of CGH microarrays fitting Hidden Markov Chain models. The selection of the number of states is made via their posterior probability computed by Reversible Jump Markov Chain Monte Carlo Methods. Also returns probabilistic common regions for gains/losses.
110 Bayesian Inference rjags Bayesian Graphical Models using MCMC Interface to the JAGS MCMC library.
111 Bayesian Inference RSGHB Functions for Hierarchical Bayesian Estimation: A Flexible Approach Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
112 Bayesian Inference RSGHB Functions for Hierarchical Bayesian Estimation: A Flexible Approach Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
113 Bayesian Inference rstan R Interface to Stan User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the ‘StanHeaders’ package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via ‘variational’ approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.
114 Bayesian Inference rstiefel Random Orthonormal Matrix Generation and Optimization on the Stiefel Manifold Simulation of random orthonormal matrices from linear and quadratic exponential family distributions on the Stiefel manifold. The most general type of distribution covered is the matrix-variate Bingham-von Mises-Fisher distribution. Most of the simulation methods are presented in Hoff(2009) “Simulation of the Matrix Bingham-von Mises-Fisher Distribution, With Applications to Multivariate and Relational Data” <doi:10.1198/jcgs.2009.07177>. The package also includes functions for optimization on the Stiefel manifold based on algorithms described in Wen and Yin (2013) “A feasible method for optimization with orthogonality constraints” <doi:10.1007/s10107-012-0584-1>.
115 Bayesian Inference runjags Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS User-friendly interface utilities for MCMC models via Just Another Gibbs Sampler (JAGS), facilitating the use of parallel (or distributed) processors for multiple chains, automated control of convergence and sample length diagnostics, and evaluation of the performance of a model using drop-k validation or against simulated data. Template model specifications can be generated using a standard lme4-style formula interface to assist users less familiar with the BUGS syntax. A JAGS extension module provides additional distributions including the Pareto family of distributions, the DuMouchel prior and the half-Cauchy prior.
116 Bayesian Inference Runuran R Interface to the ‘UNU.RAN’ Random Variate Generators Interface to the ‘UNU.RAN’ library for Universal Non-Uniform RANdom variate generators. Thus it allows to build non-uniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.
117 Bayesian Inference RxCEcolInf ‘R x C Ecological Inference With Optional Incorporation of Survey Information’ Fits the R x C inference model described in Greiner and Quinn (2009). Allows incorporation of survey results.
118 Bayesian Inference SamplerCompare A Framework for Comparing the Performance of MCMC Samplers A framework for running sets of MCMC samplers on sets of distributions with a variety of tuning parameters, along with plotting functions to visualize the results of those simulations.
119 Bayesian Inference SampleSizeMeans Sample size calculations for normal means A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate a normal mean or the difference between two normal means. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of normal means are provided. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.
120 Bayesian Inference SampleSizeProportions Calculating sample size requirements when estimating the difference between two binomial proportions A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate the difference between two binomial proportions. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of binomial observations are provided. In all cases, estimation of the difference between two binomial proportions is considered. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.
121 Bayesian Inference sbgcop Semiparametric Bayesian Gaussian Copula Estimation and Imputation Estimation and inference for parameters in a Gaussian copula model, treating the univariate marginal distributions as nuisance parameters as described in Hoff (2007) <doi:10.1214/07-AOAS107>. This package also provides a semiparametric imputation procedure for missing multivariate data.
122 Bayesian Inference SimpleTable Bayesian Inference and Sensitivity Analysis for Causal Effects from 2 x 2 and 2 x 2 x K Tables in the Presence of Unmeasured Confounding SimpleTable provides a series of methods to conduct Bayesian inference and sensitivity analysis for causal effects from 2 x 2 and 2 x 2 x K tables when unmeasured confounding is present or suspected.
123 Bayesian Inference sna Tools for Social Network Analysis A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.
124 Bayesian Inference spBayes Univariate and Multivariate Spatial-Temporal Modeling Fits univariate and multivariate spatio-temporal random effects models for point-referenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) <doi:10.18637/jss.v063.i13> and Finley, Banerjee, and Cook (2014) <doi:10.1111/2041-210X.12189>.
125 Bayesian Inference spikeslab Prediction and variable selection using spike and slab regression Spike and slab for prediction and variable selection in linear regression models. Uses a generalized elastic net for variable selection.
126 Bayesian Inference spikeSlabGAM Bayesian Variable Selection and Model Choice for Generalized Additive Mixed Models Bayesian variable selection, model choice, and regularized estimation for (spatial) generalized additive mixed regression models via stochastic search variable selection with spike-and-slab priors.
127 Bayesian Inference spTimer Spatio-Temporal Bayesian Modelling Fits, spatially predicts and temporally forecasts large amounts of space-time data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian Auto-Regressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatio-temporal big-n problems. Bakar and Sahu (2015) <doi:10.18637/jss.v063.i15>.
128 Bayesian Inference ssgraph Bayesian Graphical Estimation using Spike-and-Slab Priors Bayesian estimation for undirected graphical models using spike-and-slab priors. The package handles continuous, discrete, and mixed data.
129 Bayesian Inference ssMousetrack Bayesian State-Space Modeling of Mouse-Tracking Experiments via Stan Estimates previously compiled state-space modeling for mouse-tracking experiments using the ‘rstan’ package, which provides the R interface to the Stan C++ library for Bayesian estimation.
130 Bayesian Inference stochvol Efficient Bayesian Inference for Stochastic Volatility (SV) Models Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods. Methodological details are given in Kastner and Fruhwirth-Schnatter (2014) <doi:10.1016/j.csda.2013.01.002>; the most common use cases are described in Kastner (2016) <doi:10.18637/jss.v069.i05>. Also incorporates SV with leverage.
131 Bayesian Inference tgp Bayesian Treed Gaussian Process Models Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions.
132 Bayesian Inference zic Bayesian Inference for Zero-Inflated Count Models Provides MCMC algorithms for the analysis of zero-inflated count models. The case of stochastic search variable selection (SVS) is also considered. All MCMC samplers are coded in C++ for improved efficiency. A data set considering the demand for health care is provided.
133 Chemometrics and Computational Physics ALS (core) Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) Alternating least squares is often used to resolve components contributing to data with a bilinear structure; the basic technique may be extended to alternating constrained least squares. Commonly applied constraints include unimodality, non-negativity, and normalization of components. Several data matrices may be decomposed simultaneously by assuming that one of the two matrices in the bilinear decomposition is shared between datasets.
134 Chemometrics and Computational Physics AnalyzeFMRI Functions for Analysis of fMRI Datasets Stored in the ANALYZE or NIFTI Format Functions for I/O, visualisation and analysis of functional Magnetic Resonance Imaging (fMRI) datasets stored in the ANALYZE or NIFTI format. Note that the latest version of XQuartz seems to be necessary under MacOS.
135 Chemometrics and Computational Physics AquaEnv Integrated Development Toolbox for Aquatic Chemical Model Generation Toolbox for the experimental aquatic chemist, focused on acidification and CO2 air-water exchange. It contains all elements to model the pH, the related CO2 air-water exchange, and aquatic acid-base chemistry for an arbitrary marine, estuarine or freshwater system. It contains a suite of tools for sensitivity analysis, visualisation, modelling of chemical batches, and can be used to build dynamic models of aquatic systems. As from version 1.0-4, it also contains functions to calculate the buffer factors.
136 Chemometrics and Computational Physics astro Astronomy Functions, Tools and Routines The astro package provides a series of functions, tools and routines in everyday use within astronomy. Broadly speaking, one may group these functions into 7 main areas, namely: cosmology, FITS file manipulation, the Sersic function, plotting, data manipulation, statistics and general convenience functions and scripting tools.
137 Chemometrics and Computational Physics astrochron A Computational Tool for Astrochronology Routines for astrochronologic testing, astronomical time scale construction, and time series analysis. Also included are a range of statistical analysis and modeling routines that are relevant to time scale development and paleoclimate analysis.
138 Chemometrics and Computational Physics astrodatR Astronomical Data A collection of 19 datasets from contemporary astronomical research. They are described the textbook ‘Modern Statistical Methods for Astronomy with R Applications’ by Eric D. Feigelson and G. Jogesh Babu (Cambridge University Press, 2012, Appendix C) or on the website of Penn State’s Center for Astrostatistics (http://astrostatistics.psu.edu/datasets). These datasets can be used to exercise methodology involving: density estimation; heteroscedastic measurement errors; contingency tables; two-sample hypothesis tests; spatial point processes; nonlinear regression; mixture models; censoring and truncation; multivariate analysis; classification and clustering; inhomogeneous Poisson processes; periodic and stochastic time series analysis.
139 Chemometrics and Computational Physics astroFns Astronomy: time and position functions, misc. utilities Miscellaneous astronomy functions, utilities, and data.
140 Chemometrics and Computational Physics astrolibR Astronomy Users Library Several dozen low-level utilities and codes from the Interactive Data Language (IDL) Astronomy Users Library (http://idlastro.gsfc.nasa.gov) are implemented in R. They treat: time, coordinate and proper motion transformations; terrestrial precession and nutation, atmospheric refraction and aberration, barycentric corrections, and related effects; utilities for astrometry, photometry, and spectroscopy; and utilities for planetary, stellar, Galactic, and extragalactic science.
141 Chemometrics and Computational Physics ATmet Advanced Tools for Metrology This package provides functions for smart sampling and sensitivity analysis for metrology applications, including computationally expensive problems.
142 Chemometrics and Computational Physics Bchron Radiocarbon Dating, Age-Depth Modelling, Relative Sea Level Rate Estimation, and Non-Parametric Phase Modelling Enables quick calibration of radiocarbon dates under various calibration curves (including user generated ones); age-depth modelling as per the algorithm of Haslett and Parnell (2008) <doi:10.1111/j.1467-9876.2008.00623.x>; Relative sea level rate estimation incorporating time uncertainty in polynomial regression models (Parnell and Gehrels 2015) <doi:10.1002/9781118452547.ch32>; non-parametric phase modelling via Gaussian mixtures as a means to determine the activity of a site (and as an alternative to the Oxcal function SUM; currently unpublished), and reverse calibration of dates from calibrated into un-calibrated years (also unpublished).
143 Chemometrics and Computational Physics BioMark Find Biomarkers in Two-Class Discrimination Problems Variable selection methods are provided for several classification methods: the lasso/elastic net, PCLDA, PLSDA, and several t-tests. Two approaches for selecting cutoffs can be used, one based on the stability of model coefficients under perturbation, and the other on higher criticism.
144 Chemometrics and Computational Physics bvls The Stark-Parker algorithm for bounded-variable least squares An R interface to the Stark-Parker implementation of an algorithm for bounded-variable least squares
145 Chemometrics and Computational Physics celestial Collection of Common Astronomical Conversion Routines and Functions Contains a number of common astronomy conversion routines, particularly the HMS and degrees schemes, which can be fiddly to convert between on mass due to the textural nature of the former. It allows users to coordinate match datasets quickly. It also contains functions for various cosmological calculations.
146 Chemometrics and Computational Physics chemCal (core) Calibration Functions for Analytical Chemistry Simple functions for plotting linear calibration functions and estimating standard errors for measurements according to the Handbook of Chemometrics and Qualimetrics: Part A by Massart et al. There are also functions estimating the limit of detection (LOD) and limit of quantification (LOQ). The functions work on model objects from - optionally weighted - linear regression (lm) or robust linear regression (‘rlm’ from the ‘MASS’ package).
147 Chemometrics and Computational Physics chemometrics Multivariate Statistical Analysis in Chemometrics R companion to the book “Introduction to Multivariate Statistical Analysis in Chemometrics” written by K. Varmuza and P. Filzmoser (2009).
148 Chemometrics and Computational Physics ChemometricsWithR Chemometrics with R - Multivariate Data Analysis in the Natural Sciences and Life Sciences Functions and scripts used in the book “Chemometrics with R - Multivariate Data Analysis in the Natural Sciences and Life Sciences” by Ron Wehrens, Springer (2011). Data used in the package are available from github.
149 Chemometrics and Computational Physics ChemoSpec Exploratory Chemometrics for Spectroscopy A collection of functions for top-down exploratory data analysis of spectral data including nuclear magnetic resonance (NMR), infrared (IR), Raman, X-ray fluorescence (XRF) and other similar types of spectroscopy. Includes functions for plotting and inspecting spectra, peak alignment, hierarchical cluster analysis (HCA), principal components analysis (PCA) and model-based clustering. Robust methods appropriate for this type of high-dimensional data are available. ChemoSpec is designed for structured experiments, such as metabolomics investigations, where the samples fall into treatment and control groups. Graphical output is formatted consistently for publication quality plots. ChemoSpec is intended to be very user friendly and to help you get usable results quickly. A vignette covering typical operations is available.
150 Chemometrics and Computational Physics ChemoSpec2D Exploratory Chemometrics for 2D Spectroscopy A collection of functions for exploratory chemometrics of 2D spectroscopic data sets such as COSY (correlated spectroscopy) and HSQC (heteronuclear single quantum coherence) 2D NMR (nuclear magnetic resonance) spectra. ‘ChemoSpec2D’ deploys methods aimed primarily at classification of samples and the identification of spectral features which are important in distinguishing samples from each other. Each 2D spectrum (a matrix) is treated as the unit of observation, and thus the physical sample in the spectrometer corresponds to the sample from a statistical perspective. In addition to chemometric tools, a few tools are provided for plotting 2D spectra, but these are not intended to replace the functionality typically available on the spectrometer. ‘ChemoSpec2D’ takes many of its cues from ‘ChemoSpec’ and tries to create consistent graphical output and to be very user friendly.
151 Chemometrics and Computational Physics CHNOSZ Thermodynamic Calculations and Diagrams for Geochemistry An integrated set of tools for thermodynamic calculations in aqueous geochemistry and geobiochemistry. Functions are provided for writing balanced reactions to form species from user-selected basis species and for calculating the standard molal properties of species and reactions, including the standard Gibbs energy and equilibrium constant. Calculations of the non-equilibrium chemical affinity and equilibrium chemical activity of species can be portrayed on diagrams as a function of temperature, pressure, or activity of basis species; in two dimensions, this gives a maximum affinity or predominance diagram. The diagrams have formatted chemical formulas and axis labels, and water stability limits can be added to Eh-pH, oxygen fugacity- temperature, and other diagrams with a redox variable. The package has been developed to handle common calculations in aqueous geochemistry, such as solubility due to complexation of metal ions, mineral buffers of redox or pH, and changing the basis species across a diagram (“mosaic diagrams”). CHNOSZ also has unique capabilities for comparing the compositional and thermodynamic properties of different proteins.
152 Chemometrics and Computational Physics clustvarsel Variable Selection for Gaussian Model-Based Clustering Variable selection for Gaussian model-based clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forward-backward or backward-forward direction, with or without sub-sampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.
153 Chemometrics and Computational Physics compositions Compositional Data Analysis Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.
154 Chemometrics and Computational Physics constants Reference on Constants, Units and Uncertainty CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with errors and/or the values with units are also provided if the ‘errors’ and/or the ‘units’ packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This package contains the “2014 CODATA” version, published on 25 June 2015: Mohr, P. J., Newell, D. B. and Taylor, B. N. (2016) <doi:10.1103/RevModPhys.88.035009>, <doi:10.1063/1.4954402>.
155 Chemometrics and Computational Physics cosmoFns Functions for cosmological distances, times, luminosities, etc Package encapsulates standard expressions for distances, times, luminosities, and other quantities useful in observational cosmology, including molecular line observations. Currently coded for a flat universe only.
156 Chemometrics and Computational Physics CRAC Cosmology R Analysis Code R functions for cosmological research. The main functions are similar to the python library, cosmolopy.
157 Chemometrics and Computational Physics dielectric Defines some physical constants and dielectric functions commonly used in optics, plasmonics Physical constants. Gold, silver and glass permittivities, together with spline interpolation functions.
158 Chemometrics and Computational Physics drc Analysis of Dose-Response Curves Analysis of dose-response data is made available through a suite of flexible and versatile model fitting and after-fitting functions.
159 Chemometrics and Computational Physics eChem Simulations for Electrochemistry Experiments Simulates cyclic voltammetry, linear-sweep voltammetry (both with and without stirring of the solution), and single-pulse and double-pulse chronoamperometry and chronocoulometry experiments using the implicit finite difference method outlined in Gosser (1993, ISBN: 9781560810261) and in Brown (2015) <doi:10.1021/acs.jchemed.5b00225>. Additional functions provide ways to display and to examine the results of these simulations. The primary purpose of this package is to provide tools for use in courses in analytical chemistry.
160 Chemometrics and Computational Physics EEM Read and Preprocess Fluorescence Excitation-Emission Matrix (EEM) Data Read raw EEM data and prepares them for further analysis.
161 Chemometrics and Computational Physics elasticnet Elastic-Net for Sparse Estimation and Sparse PCA Provides functions for fitting the entire solution path of the Elastic-Net and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 2005-10.
162 Chemometrics and Computational Physics enpls Ensemble Partial Least Squares Regression An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.
163 Chemometrics and Computational Physics errors Uncertainty Propagation for R Vectors Support for measurement errors in R vectors, matrices and arrays: automatic uncertainty propagation and reporting. Documentation about ‘errors’ is provided in the paper by Ucar, Pebesma & Azcorra (2018, <doi:10.32614/RJ-2018-075>), included in this package as a vignette; see ‘citation(“errors”)’ for details.
164 Chemometrics and Computational Physics fastICA FastICA Algorithms to Perform ICA and Projection Pursuit Implementation of FastICA algorithm to perform Independent Component Analysis (ICA) and Projection Pursuit.
165 Chemometrics and Computational Physics fingerprint Functions to Operate on Binary Fingerprint Data Functions to manipulate binary fingerprints of arbitrary length. A fingerprint is represented by an object of S4 class ‘fingerprint’ which is internally represented a vector of integers, such that each element represents the position in the fingerprint that is set to 1. The bitwise logical functions in R are overridden so that they can be used directly with ‘fingerprint’ objects. A number of distance metrics are also available (many contributed by Michael Fadock). Fingerprints can be converted to Euclidean vectors (i.e., points on the unit hypersphere) and can also be folded using OR. Arbitrary fingerprint formats can be handled via line handlers. Currently handlers are provided for CDK, MOE and BCI fingerprint data.
166 Chemometrics and Computational Physics FITSio FITS (Flexible Image Transport System) Utilities Utilities to read and write files in the FITS (Flexible Image Transport System) format, a standard format in astronomy (see e.g. <https://en.wikipedia.org/wiki/FITS> for more information). Present low-level routines allow: reading, parsing, and modifying FITS headers; reading FITS images (multi-dimensional arrays); reading FITS binary and ASCII tables; and writing FITS images (multi-dimensional arrays). Higher-level functions allow: reading files composed of one or more headers and a single (perhaps multidimensional) image or single table; reading tables into data frames; generating vectors for image array axes; scaling and writing images as 16-bit integers. Known incompletenesses are reading random group extensions, as well as bit, complex, and array descriptor data types in binary tables.
167 Chemometrics and Computational Physics fmri Analysis of fMRI Experiments Contains R-functions to perform an fMRI analysis as described in Tabelow et al. (2006) <doi:10.1016/j.neuroimage.2006.06.029>, Polzehl et al. (2010) <doi:10.1016/j.neuroimage.2010.04.241>, Tabelow and Polzehl (2011) <doi:10.18637/jss.v044.i11>.
168 Chemometrics and Computational Physics fpca Restricted MLE for Functional Principal Components Analysis A geometric approach to MLE for functional principal components
169 Chemometrics and Computational Physics FTICRMS Programs for Analyzing Fourier Transform-Ion Cyclotron Resonance Mass Spectrometry Data This package was developed partially with funding from the NIH Training Program in Biomolecular Technology (2-T32-GM08799).
170 Chemometrics and Computational Physics homals Gifi Methods for Optimal Scaling Performs a homogeneity analysis (multiple correspondence analysis) and various extensions. Rank restrictions on the category quantifications can be imposed (nonlinear PCA). The categories are transformed by means of optimal scaling with options for nominal, ordinal, and numerical scale levels (for rank-1 restrictions). Variables can be grouped into sets, in order to emulate regression analysis and canonical correlation analysis.
171 Chemometrics and Computational Physics hyperSpec Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, …) Comfortable ways to work with hyperspectral data sets. I.e. spatially or time-resolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f (wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.
172 Chemometrics and Computational Physics investr Inverse Estimation/Calibration Functions Functions to facilitate inverse estimation (e.g., calibration) in linear, generalized linear, nonlinear, and (linear) mixed-effects models. A generic function is also provided for plotting fitted regression models with or without confidence/prediction bands that may be of use to the general user.
173 Chemometrics and Computational Physics Iso (core) Functions to Perform Isotonic Regression Linear order and unimodal order (univariate) isotonic regression; bivariate isotonic regression with linear order on both variables.
174 Chemometrics and Computational Physics kohonen (core) Supervised and Unsupervised Self-Organising Maps Functions to train self-organising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.
175 Chemometrics and Computational Physics leaps Regression Subset Selection Regression subset selection, including exhaustive search.
176 Chemometrics and Computational Physics lira LInear Regression in Astronomy Performs Bayesian linear regression and forecasting in astronomy. The method accounts for heteroscedastic errors in both the independent and the dependent variables, intrinsic scatters (in both variables) and scatter correlation, time evolution of slopes, normalization, scatters, Malmquist and Eddington bias, upper limits and break of linearity. The posterior distribution of the regression parameters is sampled with a Gibbs method exploiting the JAGS library.
177 Chemometrics and Computational Physics lspls LS-PLS Models Implements the LS-PLS (least squares - partial least squares) method described in for instance Jorgensen, K., Segtnan, V. H., Thyholt, K., Nas, T. (2004) “A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables” Journal of Chemometrics, 18(10), 451464, <doi:10.1002/cem.890>.
178 Chemometrics and Computational Physics MALDIquant Quantitative Analysis of Mass Spectrometry Data A complete analysis pipeline for matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) and other two-dimensional mass spectrometry data. In addition to commonly used plotting and processing methods it includes distinctive features, namely baseline subtraction methods such as morphological filters (TopHat) or the statistics-sensitive non-linear iterative peak-clipping algorithm (SNIP), peak alignment using warping functions, handling of replicated measurements as well as allowing spectra with different resolutions.
179 Chemometrics and Computational Physics MALDIrppa MALDI Mass Spectrometry Data Robust Pre-Processing and Analysis Provides methods for quality control and robust pre-processing and analysis of MALDI mass spectrometry data.
180 Chemometrics and Computational Physics measurements Tools for Units of Measurement Collection of tools to make working with physical measurements easier. Convert between metric and imperial units, or calculate a dimension’s unknown value from other dimensions’ measurements.
181 Chemometrics and Computational Physics metRology Support for Metrological Applications Provides classes and calculation and plotting functions for metrology applications, including measurement uncertainty estimation and inter-laboratory metrology comparison studies.
182 Chemometrics and Computational Physics minpack.lm R Interface to the Levenberg-Marquardt Nonlinear Least-Squares Algorithm Found in MINPACK, Plus Support for Bounds The nls.lm function provides an R interface to lmder and lmdif from the MINPACK library, for solving nonlinear least-squares problems by a modification of the Levenberg-Marquardt algorithm, with support for lower and upper parameter bounds. The implementation can be used via nls-like calls using the nlsLM function.
183 Chemometrics and Computational Physics NISTunits Fundamental Physical Constants and Unit Conversions from NIST Fundamental physical constants (Quantity, Value, Uncertainty, Unit) for SI (International System of Units) and non-SI units, plus unit conversions Based on the data from NIST (National Institute of Standards and Technology, USA)
184 Chemometrics and Computational Physics nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
185 Chemometrics and Computational Physics nlreg Higher Order Inference for Nonlinear Heteroscedastic Models Likelihood inference based on higher order approximations for nonlinear models with possibly non constant variance.
186 Chemometrics and Computational Physics nnls (core) The Lawson-Hanson algorithm for non-negative least squares (NNLS) An R interface to the Lawson-Hanson implementation of an algorithm for non-negative least squares (NNLS). Also allows the combination of non-negative and non-positive constraints.
187 Chemometrics and Computational Physics OrgMassSpecR Organic Mass Spectrometry Organic/biological mass spectrometry data analysis.
188 Chemometrics and Computational Physics pcaPP Robust PCA by Projection Pursuit Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) <doi:10.2139/ssrn.968376>, Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/978-3-642-33042-1_31>.
189 Chemometrics and Computational Physics PET Simulation and Reconstruction of PET Images Implementation of different analytic/direct and iterative reconstruction methods of radon transformed data such as PET data. It also offer the possibility to simulate PET data.
190 Chemometrics and Computational Physics planar Multilayer Optics Solves the electromagnetic problem of reflection and transmission at a planar multilayer interface. Also computed are the decay rates and emission profile for a dipolar emitter.
191 Chemometrics and Computational Physics pls (core) Partial Least Squares and Principal Component Regression Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).
192 Chemometrics and Computational Physics plspm Tools for Partial Least Squares Path Modeling (PLS-PM) Partial Least Squares Path Modeling (PLS-PM) analysis for both metric and non-metric data, as well as REBUS analysis.
193 Chemometrics and Computational Physics ppls Penalized Partial Least Squares Contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via cross-validation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.
194 Chemometrics and Computational Physics prospectr Miscellaneous functions for processing and sample selection of vis-NIR diffuse reflectance data The package provides functions for pretreatment and sample selection of visible and near infrared diffuse reflectance spectra
195 Chemometrics and Computational Physics psy Various procedures used in psychometry Kappa, ICC, Cronbach alpha, screeplot, mtmm
196 Chemometrics and Computational Physics PTAk (core) Principal Tensor Analysis on k Modes A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting non-identity metrics and penalisations. 2-way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tucker-n) and PARAFAC/CANDECOMP with these extensions.
197 Chemometrics and Computational Physics rcdk Interface to the ‘CDK’ Libraries Allows the user to access functionality in the ‘CDK’, a Java framework for chemoinformatics. This allows the user to load molecules, evaluate fingerprints, calculate molecular descriptors and so on. In addition, the ‘CDK’ API allows the user to view structures in 2D.
198 Chemometrics and Computational Physics rcdklibs The CDK Libraries Packaged for R An R interface to the Chemistry Development Kit, a Java library for chemoinformatics. Given the size of the library itself, this package is not expected to change very frequently. To make use of the CDK within R, it is suggested that you use the ‘rcdk’ package. Note that it is possible to directly interact with the CDK using ‘rJava’. However ‘rcdk’ exposes functionality in a more idiomatic way. The CDK library itself is released as LGPL and the sources can be obtained from <https://github.com/cdk/cdk>.
199 Chemometrics and Computational Physics represent Determine the representativity of two multidimensional data sets Contains workhorse function jrparams(), as well as two helper functions Mboxtest() and JRsMahaldist(), and four example data sets.
200 Chemometrics and Computational Physics resemble Regression and Similarity Evaluation for Memory-Based Learning in Spectral Chemometrics Implementation of functions for spectral similarity/dissimilarity analysis and memory-based learning (MBL) for non-linear modeling in complex spectral datasets. In chemometrics MBL is also known as local modeling.
201 Chemometrics and Computational Physics RobPer Robust Periodogram and Periodicity Detection Methods Calculates periodograms based on (robustly) fitting periodic functions to light curves (irregularly observed time series, possibly with measurement accuracies, occurring in astroparticle physics). Three main functions are included: RobPer() calculates the periodogram. Outlying periodogram bars (indicating a period) can be detected with betaCvMfit(). Artificial light curves can be generated using the function tsgen(). For more details see the corresponding article: Thieler, Fried and Rathjens (2016), Journal of Statistical Software 69(9), 1-36, <doi:10.18637/jss.v069.i09>.
202 Chemometrics and Computational Physics rpubchem An Interface to the PubChem Collection Access PubChem data (compounds, substance, assays) using R. Structural information is provided in the form of SMILES strings. It currently only provides access to a subset of the precalculated data stored by PubChem. Bio-assay data can be accessed to obtain descriptions as well as the actual data. It is also possible to search for assay ID’s by keyword.
203 Chemometrics and Computational Physics sapa Spectral Analysis for Physical Applications Software for the book Spectral Analysis for Physical Applications, Donald B. Percival and Andrew T. Walden, Cambridge University Press, 1993.
204 Chemometrics and Computational Physics SCEPtER Stellar CharactEristics Pisa Estimation gRid SCEPtER pipeline for estimating the stellar age, mass, and radius given observational effective temperature, [Fe/H], and astroseismic parameters. The results are obtained adopting a maximum likelihood technique over a grid of pre-computed stellar models.
205 Chemometrics and Computational Physics SCEPtERbinary Stellar CharactEristics Pisa Estimation gRid for Binary Systems SCEPtER pipeline for estimating the stellar age for double-lined detached binary systems. The observational constraints adopted in the recovery are the effective temperature, the metallicity [Fe/H], the mass, and the radius of the two stars. The results are obtained adopting a maximum likelihood technique over a grid of pre-computed stellar models.
206 Chemometrics and Computational Physics simecol Simulation of Ecological (and Other) Dynamic Systems An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individual-based (or agent-based) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and re-usability of code.
207 Chemometrics and Computational Physics snapshot Gadget N-body cosmological simulation code snapshot I/O utilities Functions for reading and writing Gadget N-body snapshots. The Gadget code is popular in astronomy for running N-body / hydrodynamical cosmological and merger simulations. To find out more about Gadget see the main distribution page at www.mpa-garching.mpg.de/gadget/
208 Chemometrics and Computational Physics solaR Radiation and Photovoltaic Systems Calculation methods of solar radiation and performance of photovoltaic systems from daily and intradaily irradiation data sources.
209 Chemometrics and Computational Physics som Self-Organizing Map Self-Organizing Map (with application in gene clustering).
210 Chemometrics and Computational Physics SPADAR Spherical Projections of Astronomical Data Provides easy to use functions to create all-sky grid plots of widely used astronomical coordinate systems (equatorial, ecliptic, galactic) and scatter plots of data on any of these systems including on-the-fly system conversion. It supports any type of spherical projection to the plane defined by the ‘mapproj’ package.
211 Chemometrics and Computational Physics speaq Tools for Nuclear Magnetic Resonance (NMR) Spectra Alignment, Peak Based Processing, Quantitative Analysis and Visualizations Makes Nuclear Magnetic Resonance spectroscopy (NMR spectroscopy) data analysis as easy as possible by only requiring a small set of functions to perform an entire analysis. ‘speaq’ offers the possibility of raw spectra alignment and quantitation but also an analysis based on features whereby the spectra are converted to peaks which are then grouped and turned into features. These features can be processed with any number of statistical tools either included in ‘speaq’ or available elsewhere on CRAN. More details can be found in Vu et al. (2011) <doi:10.1186/1471-2105-12-405> and Beirnaert et al. (2018) <doi:10.1371/journal.pcbi.1006018>.
212 Chemometrics and Computational Physics spectralAnalysis Pre-Process, Visualize and Analyse Process Analytical Data, by Spectral Data Measurements Made During a Chemical Process Infrared, near-infrared and Raman spectroscopic data measured during chemical reactions, provide structural fingerprints by which molecules can be identified and quantified. The application of these spectroscopic techniques as inline process analytical tools (PAT), provides the (pharma-)chemical industry with novel tools, allowing to monitor their chemical processes, resulting in a better process understanding through insight in reaction rates, mechanistics, stability, etc. Data can be read into R via the generic spc-format, which is generally supported by spectrometer vendor software. Versatile pre-processing functions are available to perform baseline correction by linking to the ‘baseline’ package; noise reduction via the ‘signal’ package; as well as time alignment, normalization, differentiation, integration and interpolation. Implementation based on the S4 object system allows storing a pre-processing pipeline as part of a spectral data object, and easily transferring it to other datasets. Interactive plotting tools are provided based on the ‘plotly’ package. Non-negative matrix factorization (NMF) has been implemented to perform multivariate analyses on individual spectral datasets or on multiple datasets at once. NMF provides a parts-based representation of the spectral data in terms of spectral signatures of the chemical compounds and their relative proportions. The functionality to read in spc-files was adapted from the ‘hyperSpec’ package.
213 Chemometrics and Computational Physics spls Sparse Partial Least Squares (SPLS) Regression and Classification Provides functions for fitting a sparse partial least squares (SPLS) regression and classification (Chun and Keles (2010) <doi:10.1111/j.1467-9868.2009.00723.x>).
214 Chemometrics and Computational Physics stellaR stellar evolution tracks and isochrones A package to manage and display stellar tracks and isochrones from Pisa low-mass database. Includes tools for isochrones construction and tracks interpolation.
215 Chemometrics and Computational Physics stepPlr L2 Penalized Logistic Regression with Stepwise Variable Selection L2 penalized logistic regression for both continuous and discrete predictors, with forward stagewise/forward stepwise variable selection procedure.
216 Chemometrics and Computational Physics subselect Selecting Variable Subsets A collection of functions which (i) assess the quality of variable subsets as surrogates for a full data set, in either an exploratory data analysis or in the context of a multivariate linear model, and (ii) search for subsets which are optimal under various criteria.
217 Chemometrics and Computational Physics TIMP Fitting Separable Nonlinear Models in Spectroscopy and Microscopy A problem-solving environment (PSE) for fitting separable nonlinear models to measurements arising in physics and chemistry experiments; has been extensively applied to time-resolved spectroscopy and FLIM-FRET data.
218 Chemometrics and Computational Physics titan Titration analysis for mass spectrometry data GUI to analyze mass spectrometric data on the relative abundance of two substances from a titration series.
219 Chemometrics and Computational Physics titrationCurves Acid/Base, Complexation, Redox, and Precipitation Titration Curves A collection of functions to plot acid/base titration curves (pH vs. volume of titrant), complexation titration curves (pMetal vs. volume of EDTA), redox titration curves (potential vs.volume of titrant), and precipitation titration curves (either pAnalyte or pTitrant vs. volume of titrant). Options include the titration of mixtures, the ability to overlay two or more titration curves, and the ability to show equivalence points.
220 Chemometrics and Computational Physics units Measurement Units for R Vectors Support for measurement units in R vectors, matrices and arrays: automatic propagation, conversion, derivation and simplification of units; raising errors in case of unit incompatibility. Compatible with the POSIXct, Date and difftime classes. Uses the UNIDATA udunits library and unit database for unit compatibility checking and conversion. Documentation about ‘units’ is provided in the paper by Pebesma, Mailund & Hiebert (2016, <doi:10.32614/RJ-2016-061>), included in this package as a vignette; see ‘citation(“units”)’ for details.
221 Chemometrics and Computational Physics UPMASK Unsupervised Photometric Membership Assignment in Stellar Clusters An implementation of the UPMASK method for performing membership assignment in stellar clusters in R. It is prepared to use photometry and spatial positions, but it can take into account other types of data. The method is able to take into account arbitrary error models, and it is unsupervised, data-driven, physical-model-free and relies on as few assumptions as possible. The approach followed for membership assessment is based on an iterative process, dimensionality reduction, a clustering algorithm and a kernel density estimation.
222 Chemometrics and Computational Physics varSelRF Variable Selection using Random Forests Variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). Main applications in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).
223 Chemometrics and Computational Physics webchem Chemical Information from the Web Chemical information from around the web. This package interacts with a suite of web APIs for chemical information.
224 Chemometrics and Computational Physics WilcoxCV Wilcoxon-based variable selection in cross-validation This package provides functions to perform fast variable selection based on the Wilcoxon rank sum test in the cross-validation or Monte-Carlo cross-validation settings, for use in microarray-based binary classification.
225 Clinical Trial Design, Monitoring, and Analysis adaptTest (core) Adaptive two-stage tests The functions defined in this program serve for implementing adaptive two-stage tests. Currently, four tests are included: Bauer and Koehne (1994), Lehmacher and Wassmer (1999), Vandemeulebroecke (2006), and the horizontal conditional error function. User-defined tests can also be implemented. Reference: Vandemeulebroecke, An investigation of two-stage tests, Statistica Sinica 2006.
226 Clinical Trial Design, Monitoring, and Analysis AGSDest Estimation in Adaptive Group Sequential Trials Calculation of repeated confidence intervals as well as confidence intervals based on the stage-wise ordering in group sequential designs and adaptive group sequential designs. For adaptive group sequential designs the confidence intervals are based on the conditional rejection probability principle. Currently the procedures do not support the use of futility boundaries or more than one adaptive interim analysis.
227 Clinical Trial Design, Monitoring, and Analysis asd (core) Simulations for Adaptive Seamless Designs Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.
228 Clinical Trial Design, Monitoring, and Analysis asypow Calculate Power Utilizing Asymptotic Likelihood Ratio Methods A set of routines written in the S language that calculate power and related quantities utilizing asymptotic likelihood ratio methods.
229 Clinical Trial Design, Monitoring, and Analysis bcrm (core) Bayesian Continual Reassessment Method for Phase I Dose-Escalation Trials Implements a wide variety of one- and two-parameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics. See Sweeting et al. (2013): <doi:10.18637/jss.v054.i13>.
230 Clinical Trial Design, Monitoring, and Analysis binomSamSize Confidence Intervals and Sample Size Determination for a Binomial Proportion under Simple Random Sampling and Pooled Sampling A suite of functions to compute confidence intervals and necessary sample sizes for the parameter p of the Bernoulli B(p) distribution under simple random sampling or under pooled sampling. Such computations are e.g. of interest when investigating the incidence or prevalence in populations. The package contains functions to compute coverage probabilities and coverage coefficients of the provided confidence intervals procedures. Sample size calculations are based on expected length.
231 Clinical Trial Design, Monitoring, and Analysis blockrand (core) Randomization for block random clinical trials Create randomizations for block random clinical trials. Can also produce a pdf file of randomization cards.
232 Clinical Trial Design, Monitoring, and Analysis clinfun (core) Clinical Trial Design and Data Analysis Functions Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.
233 Clinical Trial Design, Monitoring, and Analysis clinsig Clinical Significance Functions Functions for calculating clinical significance.
234 Clinical Trial Design, Monitoring, and Analysis clusterPower Power Calculations for Cluster-Randomized and Cluster-Randomized Crossover Trials Calculate power for cluster randomized trials (CRTs) that compare two means, two proportions, or two counts using closed-form solutions. In addition, calculate power for cluster randomized crossover trials using Monte Carlo methods. For more information, see Reich et al. (2012) <doi:10.1371/journal.pone.0035564>.
235 Clinical Trial Design, Monitoring, and Analysis coin Conditional Inference Procedures in a Permutation Test Framework Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems.
236 Clinical Trial Design, Monitoring, and Analysis conf.design Construction of factorial designs This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.
237 Clinical Trial Design, Monitoring, and Analysis CRM Continual Reassessment Method (CRM) for Phase I Clinical Trials Functions for phase I clinical trials using the continual reassessment method.
238 Clinical Trial Design, Monitoring, and Analysis CRTSize (core) Sample Size Estimation Functions for Cluster Randomized Trials Sample size estimation in cluster (group) randomized trials. Contains traditional power-based methods, empirical smoothing (Rotondi and Donner, 2009), and updated meta-analysis techniques (Rotondi and Donner, 2012).
239 Clinical Trial Design, Monitoring, and Analysis dfcrm (core) Dose-Finding by the Continual Reassessment Method Provides functions to run the CRM and TITE-CRM in phase I trials and calibration tools for trial planning purposes.
240 Clinical Trial Design, Monitoring, and Analysis dfped Extrapolation and Bridging of Adult Information in Early Phase Dose-Finding Paediatrics Studies A unified method for designing and analysing dose-finding trials in paediatrics, while bridging information from adults, is proposed in the ‘dfped’ package. The dose range can be calculated under three extrapolation methods: linear, allometry and maturation adjustment, using pharmacokinetic (PK) data. To do this, it is assumed that target exposures are the same in both populations. The working model and prior distribution parameters of the dose-toxicity and dose-efficacy relationships can be obtained using early phase adult toxicity and efficacy data at several dose levels through ‘dfped’ package. Priors are used into the dose finding process through a Bayesian model selection or adaptive priors, to facilitate adjusting the amount of prior information to differences between adults and children. This calibrates the model to adjust for misspecification if the adult and paediatric data are very different. User can use his/her own Bayesian model written in Stan code through the ‘dfped’ package. A template of this model is proposed in the examples of the corresponding R functions in the package. Finally, in this package you can find a simulation function for one trial or for more than one trial. These methods are proposed by Petit et al, (2016) <doi:10.1177/0962280216671348>.
241 Clinical Trial Design, Monitoring, and Analysis dfpk Bayesian Dose-Finding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods, proposed by Ursino et al, (2017) <doi:10.1002/bimj.201600084>, enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).
242 Clinical Trial Design, Monitoring, and Analysis DoseFinding Planning and Analyzing Dose Finding Experiments The DoseFinding package provides functions for the design and analysis of dose-finding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting non-linear dose-response models (using Bayesian and non-Bayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.
243 Clinical Trial Design, Monitoring, and Analysis epibasix Elementary Epidemiological Functions for Epidemiology and Biostatistics Contains elementary tools for analysis of common epidemiological problems, ranging from sample size estimation, through 2x2 contingency table analysis and basic measures of agreement (kappa, sensitivity/specificity). Appropriate print and summary statements are also written to facilitate interpretation wherever possible. Source code is commented throughout to facilitate modification. The target audience includes advanced undergraduate and graduate students in epidemiology or biostatistics courses, and clinical researchers.
244 Clinical Trial Design, Monitoring, and Analysis ewoc Escalation with Overdose Control An implementation of a variety of escalation with overdose control designs introduced by Babb, Rogatko and Zacks (1998) <doi:10.1002/(SICI)1097-0258(19980530)17:10%3C1103::AID-SIM793%3E3.0.CO;2-9>. It calculates the next dose as a clinical trial proceeds as well as performs simulations to obtain operating characteristics.
245 Clinical Trial Design, Monitoring, and Analysis experiment (core) R Package for Designing and Analyzing Randomized Experiments Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomized-block and matched-pair designs based on possibly multivariate pre-treatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, two-stage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.
246 Clinical Trial Design, Monitoring, and Analysis FrF2 Fractional Factorial Designs with 2-Level Factors Regular and non-regular Fractional Factorial 2-level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2-level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the built-in function alias).
247 Clinical Trial Design, Monitoring, and Analysis GroupSeq (core) A GUI-Based Program to Compute Probabilities Regarding Group Sequential Designs A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by Lan-DeMets with various alpha spending functions being available to choose among.
248 Clinical Trial Design, Monitoring, and Analysis gsbDesign Group Sequential Bayes Design Group Sequential Operating Characteristics for Clinical, Bayesian two-arm Trials with known Sigma and Normal Endpoints.
249 Clinical Trial Design, Monitoring, and Analysis gsDesign (core) Group Sequential Design Derives group sequential designs and describes their properties.
250 Clinical Trial Design, Monitoring, and Analysis HH Statistical Analysis and Data Display: Heiberger and Holland Support software for Statistical Analysis and Data Display (Second Edition, Springer, ISBN 978-1-4939-2121-8, 2015) and (First Edition, Springer, ISBN 0-387-40270-5, 2004) by Richard M. Heiberger and Burt Holland. This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The second edition includes redesigned graphics and additional chapters. The authors emphasize how to construct and interpret graphs, discuss principles of graphical design, and show how accompanying traditional tabular results are used to confirm the visual impressions derived directly from the graphs. Many of the graphical formats are novel and appear here for the first time in print. All chapters have exercises. All functions introduced in the book are in the package. R code for all examples, both graphs and tables, in the book is included in the scripts directory of the package.
251 Clinical Trial Design, Monitoring, and Analysis Hmisc (core) Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
252 Clinical Trial Design, Monitoring, and Analysis InformativeCensoring Multiple Imputation for Informative Censoring Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation from Jackson et al. (2014) <doi:10.1002/sim.6274> and Risk Score Imputation from Hsu et al. (2009) <doi:10.1002/sim.3480>.
253 Clinical Trial Design, Monitoring, and Analysis ldbounds (core) Lan-DeMets Method for Group Sequential Boundaries Computations related to group sequential boundaries. Includes calculation of bounds using the Lan-DeMets alpha spending function approach.
254 Clinical Trial Design, Monitoring, and Analysis MCPMod Design and Analysis of Dose-Finding Studies Implements a methodology for the design and analysis of dose-response studies that combines aspects of multiple comparison procedures and modeling approaches (Bretz, Pinheiro and Branson, 2005, Biometrics 61, 738-748, <doi:10.1111/j.1541-0420.2005.00344.x>). The package provides tools for the analysis of dose finding trials as well as a variety of tools necessary to plan a trial to be conducted with the MCP-Mod methodology. Please note: The ‘MCPMod’ package will not be further developed, all future development of the MCP-Mod methodology will be done in the ‘DoseFinding’ R-package.
255 Clinical Trial Design, Monitoring, and Analysis Mediana Clinical Trial Simulations Provides a general framework for clinical trial simulations based on the Clinical Scenario Evaluation (CSE) approach. The package supports a broad class of data models (including clinical trials with continuous, binary, survival-type and count-type endpoints as well as multivariate outcomes that are based on combinations of different endpoints), analysis strategies and commonly used evaluation criteria.
256 Clinical Trial Design, Monitoring, and Analysis meta General Package for Meta-Analysis User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rucker <doi:10.1007/978-3-319-21416-0>, “Meta-Analysis with R” (2015): - fixed effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L’Abbe, Baujat, bubble); - statistical tests and trim-and-fill method to evaluate bias in meta-analysis; - import data from ‘RevMan 5’; - prediction interval, Hartung-Knapp and Paule-Mandel method for random effects model; - cumulative meta-analysis and leave-one-out meta-analysis; - meta-regression; - generalised linear mixed models; - produce forest plot summarising several (subgroup) meta-analyses.
257 Clinical Trial Design, Monitoring, and Analysis metafor Meta-Analysis Package for R A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L’Abbe, Baujat, GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto’s method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted.
258 Clinical Trial Design, Monitoring, and Analysis metaLik Likelihood Inference in Meta-Analysis and Meta-Regression Models First- and higher-order likelihood inference in meta-analysis and meta-regression models.
259 Clinical Trial Design, Monitoring, and Analysis metasens Advanced Statistical Methods to Model and Adjust for Bias in Meta-Analysis The following methods are implemented to evaluate how sensitive the results of a meta-analysis are to potential bias in meta-analysis and to support Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 5 ‘Small-Study Effects in Meta-Analysis’: - Copas selection model described in Copas & Shi (2001) <doi:10.1177/096228020101000402>; - limit meta-analysis by Rucker et al. (2011) <doi:10.1093/biostatistics/kxq046>; - upper bound for outcome reporting bias by Copas & Jackson (2004) <doi:10.1111/j.0006-341X.2004.00161.x>; - imputation methods for missing binary data by Gamble & Hollis (2005) <doi:10.1016/j.jclinepi.2004.09.013> and Higgins et al. (2008) <doi:10.1177/1740774508091600>.
260 Clinical Trial Design, Monitoring, and Analysis multcomp Simultaneous Inference in General Parametric Models Simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects, and survival models. The package includes demos reproducing analyzes presented in the book “Multiple Comparisons Using R” (Bretz, Hothorn, Westfall, 2010, CRC Press).
261 Clinical Trial Design, Monitoring, and Analysis nppbib Nonparametric Partially-Balanced Incomplete Block Design Analysis Implements a nonparametric statistical test for rank or score data from partially-balanced incomplete block-design experiments.
262 Clinical Trial Design, Monitoring, and Analysis PIPS (core) Predicted Interval Plots Generate Predicted Interval Plots. Simulate and plot confidence intervals of an effect estimate given observed data and a hypothesis about the distribution of future data.
263 Clinical Trial Design, Monitoring, and Analysis PowerTOST (core) Power and Sample Size for (Bio)Equivalence Studies Contains functions to calculate power and sample size for various study designs used in bioequivalence studies. Use known.designs() to see the designs supported. Power and sample size can be obtained based on different methods, among them prominently the TOST procedure (two-onesided-t-tests). See README and NEWS for further information.
264 Clinical Trial Design, Monitoring, and Analysis pwr (core) Basic Functions for Power Analysis Power analysis functions along the lines of Cohen (1988).
265 Clinical Trial Design, Monitoring, and Analysis PwrGSD (core) Power in a Group Sequential Design Tools for the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.
266 Clinical Trial Design, Monitoring, and Analysis qtlDesign (core) Design of QTL experiments Tools for the design of QTL experiments
267 Clinical Trial Design, Monitoring, and Analysis rmeta Meta-Analysis Functions for simple fixed and random effects meta-analysis for two-sample comparisons and cumulative meta-analyses. Draws standard summary plots, funnel plots, and computes summaries and tests for association and heterogeneity.
268 Clinical Trial Design, Monitoring, and Analysis samplesize Sample Size Calculation for Various t-Tests and Wilcoxon-Test Computes sample size for Student’s t-test and for the Wilcoxon-Mann-Whitney test for categorical data. The t-test function allows paired and unpaired (balanced / unbalanced) designs as well as homogeneous and heterogeneous variances. The Wilcoxon function allows for ties.
269 Clinical Trial Design, Monitoring, and Analysis speff2trial (core) Semiparametric efficient estimation for a two-sample treatment effect The package performs estimation and testing of the treatment effect in a 2-group randomized clinical trial with a quantitative, dichotomous, or right-censored time-to-event endpoint. The method improves efficiency by leveraging baseline predictors of the endpoint. The inverse probability weighting technique of Robins, Rotnitzky, and Zhao (JASA, 1994) is used to provide unbiased estimation when the endpoint is missing at random.
270 Clinical Trial Design, Monitoring, and Analysis ssanv Sample Size Adjusted for Nonadherence or Variability of Input Parameters A set of functions to calculate sample size for two-sample difference in means tests. Does adjustments for either nonadherence or variability that comes from using data to estimate parameters.
271 Clinical Trial Design, Monitoring, and Analysis survival (core) Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
272 Clinical Trial Design, Monitoring, and Analysis TEQR (core) Target Equivalence Range Design The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is de-escalated only if an unacceptable level of toxicity is experienced.
273 Clinical Trial Design, Monitoring, and Analysis ThreeArmedTrials Design and Analysis of Clinical Non-Inferiority or Superiority Trials with Active and Placebo Control Design and analyze three-arm non-inferiority or superiority trials which follow a gold-standard design, i.e. trials with an experimental treatment, an active, and a placebo control. Method for the following distributions are implemented: Poisson (Mielke and Munk (2009) <arXiv:0912.4169>), negative binomial (Muetze et al. (2016) <doi:10.1002/sim.6738>), normal (Pigeot et al. (2003) <doi:10.1002/sim.1450>; Hasler et al. (2009) <doi:10.1002/sim.3052>), binary (Friede and Kieser (2007) <doi:10.1002/sim.2543>), nonparametric (Muetze et al. (2017) <doi:10.1002/sim.7176>), exponential (Mielke and Munk (2009) <arXiv:0912.4169>).
274 Clinical Trial Design, Monitoring, and Analysis ThreeGroups ML Estimator for Baseline-Placebo-Treatment (Three-Group) Experiments Implements the Maximum Likelihood estimator for baseline, placebo, and treatment groups (three-group) experiments with non-compliance proposed by Gerber, Green, Kaplan, and Kern (2010).
275 Clinical Trial Design, Monitoring, and Analysis TrialSize (core) R functions in Chapter 3,4,6,7,9,10,11,12,14,15 functions and examples in Sample Size Calculation in Clinical Research.
276 Cluster Analysis & Finite Mixture Models AdMit Adaptive Mixture of Student-t Distributions Provides functions to perform the fitting of an adaptive mixture of Student-t distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the Metropolis-Hastings algorithm to obtain quantities of interest for the target density itself.
277 Cluster Analysis & Finite Mixture Models ADPclust Fast Clustering Using Adaptive Density Peak Detection An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon the idea of Rodriguez and Laio (2014)<doi:10.1126/science.1242072>. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes a user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional “density-distance plot”. Here is the research article associated with this package: “Wang, Xiao-Feng, and Yifan Xu (2015)<doi:10.1177/0962280215609948> Fast clustering using adaptive density peak detection.” Statistical methods in medical research“. url: http://smm.sagepub.com/content/early/2015/10/15/0962280215609948.abstract.
278 Cluster Analysis & Finite Mixture Models amap Another Multidimensional Analysis Package Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).
279 Cluster Analysis & Finite Mixture Models apcluster Affinity Propagation Clustering Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) <doi:10.1126/science.1136800>. The algorithms are largely analogous to the ‘Matlab’ code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplar-based agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results.
280 Cluster Analysis & Finite Mixture Models BayesLCA Bayesian Latent Class Analysis Bayesian Latent Class Analysis using several different methods.
281 Cluster Analysis & Finite Mixture Models bayesm Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
282 Cluster Analysis & Finite Mixture Models bayesmix Bayesian Mixture Models with JAGS The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.
283 Cluster Analysis & Finite Mixture Models bclust Bayesian Hierarchical Clustering Using Spike and Slab Models Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.
284 Cluster Analysis & Finite Mixture Models bgmm Gaussian Mixture Modeling Algorithms and the Belief-Based Mixture Modeling Two partially supervised mixture modeling methods: soft-label and belief-based modeling are implemented. For completeness, we equipped the package also with the functionality of unsupervised, semi- and fully supervised mixture modeling. The package can be applied also to selection of the best-fitting from a set of models with different component numbers or constraints on their structures. For detailed introduction see: Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software <doi:10.18637/jss.v047.i03>.
285 Cluster Analysis & Finite Mixture Models biclust BiCluster Algorithms The main function biclust() provides several algorithms to find biclusters in two-dimensional data: Cheng and Church (2000, ISBN:1-57735-115-0), spectral (2003) <doi:10.1101/gr.648603>, plaid model (2005) <doi:10.1016/j.csda.2004.02.003>, xmotifs (2003) <doi:10.1142/9789812776303_0008> and bimax (2006) <doi:10.1093/bioinformatics/btl060>. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.
286 Cluster Analysis & Finite Mixture Models Bmix Bayesian Sampling for Stick-Breaking Mixtures This is a bare-bones implementation of sampling algorithms for a variety of Bayesian stick-breaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stick-breaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stick-breaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.
287 Cluster Analysis & Finite Mixture Models bmixture Bayesian Estimation for Finite Mixture of Distributions Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s00180-012-0323-3> and Mohammadi and Salehi-Rad (2012) <doi:10.1080/03610918.2011.588358>.
288 Cluster Analysis & Finite Mixture Models cba Clustering for Business Analytics Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.
289 Cluster Analysis & Finite Mixture Models cclust Convex Clustering Methods and Clustering Indexes Convex Clustering methods, including K-means algorithm, On-line Update algorithm (Hard Competitive Learning) and Neural Gas algorithm (Soft Competitive Learning), and calculation of several indexes for finding the number of clusters in a data set.
290 Cluster Analysis & Finite Mixture Models CEC Cross-Entropy Clustering CEC divides data into Gaussian type clusters. The implementation allows the simultaneous use of various type Gaussian mixture models, performs the reduction of unnecessary clusters and it’s able to discover new groups. Based on Spurek, P. and Tabor, J. (2014) <doi:10.1016/j.patcog.2014.03.006>.
291 Cluster Analysis & Finite Mixture Models CHsharp Choi and Hall Style Data Sharpening Functions for use in perturbing data prior to use of nonparametric smoothers and clustering.
292 Cluster Analysis & Finite Mixture Models clue Cluster Ensembles CLUster Ensembles.
293 Cluster Analysis & Finite Mixture Models cluster (core) “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.
294 Cluster Analysis & Finite Mixture Models clusterCrit Clustering Indices Compute clustering validation indices.
295 Cluster Analysis & Finite Mixture Models clusterfly Explore clustering interactively using R and GGobi Visualise clustering algorithms with GGobi. Contains both general code for visualising clustering results and specific visualisations for model-based, hierarchical and SOM clustering.
296 Cluster Analysis & Finite Mixture Models clusterGeneration Random Cluster Generation (with Specified Degree of Separation) We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1-D and 2-D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.
297 Cluster Analysis & Finite Mixture Models ClusterR Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of ‘RcppArmadillo’ to speed up the computationally intensive parts of the functions. For more information, see (i) “Clustering in an Object-Oriented Environment” by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) “Web-scale k-means clustering” by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) “Armadillo: a template-based C++ library for linear algebra” by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) “Clustering by Passing Messages Between Data Points” by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.
298 Cluster Analysis & Finite Mixture Models clusterRepro Reproducibility of Gene Expression Clusters This is a function for validating microarray clusters via reproducibility, based on the paper referenced below.
299 Cluster Analysis & Finite Mixture Models clusterSim Searching for Optimal Clustering Procedure for a Data Set Distance measures (GDM1, GDM2, Sokal-Michener, Bray-Curtis, for symbolic interval-valued data), cluster quality indices (Calinski-Harabasz, Baker-Hubert, Hubert-Levine, Silhouette, Krzanowski-Lai, Hartigan, Gap, Davies-Bouldin), data normalization formulas (metric data, interval-valued symbolic data), data generation (typical and non-typical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic interval-valued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/978-3-642-57280-7_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/978-3-642-55721-7_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/1467-9868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/978-3-540-78246-9_11>).
300 Cluster Analysis & Finite Mixture Models clustMixType k-Prototypes Clustering for Mixed Variable-Type Data Functions to perform k-prototypes partitioning clustering for mixed variable-type data according to Z.Huang (1998): Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283-304, <doi:10.1023/A:1009769707641>.
301 Cluster Analysis & Finite Mixture Models clustvarsel Variable Selection for Gaussian Model-Based Clustering Variable selection for Gaussian model-based clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forward-backward or backward-forward direction, with or without sub-sampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.
302 Cluster Analysis & Finite Mixture Models clv Cluster Validation Techniques Package contains most of the popular internal and external cluster validation methods ready to use for the most of the outputs produced by functions coming from package “cluster”. Package contains also functions and examples of usage for cluster stability approach that might be applied to algorithms implemented in “cluster” package as well as user defined clustering algorithms.
303 Cluster Analysis & Finite Mixture Models clValid Validation of Clustering Results Statistical and biological validation of clustering results.
304 Cluster Analysis & Finite Mixture Models CoClust Copula Based Cluster Analysis A copula based clustering algorithm that finds clusters according to the complex multivariate dependence structure of the data generating process. The updated version of the algorithm is described in Di Lascio, F.M.L. and Giannerini, S. (2016). “Clustering dependent observations with copula functions”. Statistical Papers, p.1-17. <doi:10.1007/s00362-016-0822-3>.
305 Cluster Analysis & Finite Mixture Models compHclust Complementary Hierarchical Clustering Performs the complementary hierarchical clustering procedure and returns X’ (the expected residual matrix) and a vector of the relative gene importances.
306 Cluster Analysis & Finite Mixture Models dbscan Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms A fast reimplementation of several density-based algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (density-based spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms HDBSCAN (hierarchical DBSCAN) and the LOF (local outlier factor) algorithm. The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.
307 Cluster Analysis & Finite Mixture Models dendextend Extending ‘dendrogram’ Functionality in R Offers a set of functions for extending ‘dendrogram’ objects in R, letting you visualize and compare trees of ‘hierarchical clusterings’. You can (1) Adjust a tree’s graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different ‘dendrograms’ to one another.
308 Cluster Analysis & Finite Mixture Models depmix Dependent Mixture Models Fits (multigroup) mixtures of latent or hidden Markov models on mixed categorical and continuous (timeseries) data. The Rdonlp2 package can optionally be used for optimization of the log-likelihood and is available from R-forge.
309 Cluster Analysis & Finite Mixture Models depmixS4 Dependent Mixture Models - Hidden Markov Models of GLMs and Other Distributions in S4 Fits latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models, see Visser & Speekenbrink (2010, <doi:10.18637/jss.v036.i07>).
310 Cluster Analysis & Finite Mixture Models dpmixsim Dirichlet Process Mixture Model Simulation for Clustering and Image Segmentation The ‘dpmixsim’ package implements a Dirichlet Process Mixture (DPM) model for clustering and image segmentation. The DPM model is a Bayesian nonparametric methodology that relies on MCMC simulations for exploring mixture models with an unknown number of components. The code implements conjugate models with normal structure (conjugate normal-normal DP mixture model). The package’s applications are oriented towards the classification of magnetic resonance images according to tissue type or region of interest.
311 Cluster Analysis & Finite Mixture Models dynamicTreeCut Methods for Detection of Clusters in Hierarchical Clustering Dendrograms Contains methods for detection of clusters in hierarchical clustering dendrograms.
312 Cluster Analysis & Finite Mixture Models e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
313 Cluster Analysis & Finite Mixture Models edci Edge Detection and Clustering in Images Detection of edge points in images based on the difference of two asymmetric M-kernel estimators. Linear and circular regression clustering based on redescending M-estimators. Detection of linear edges in images.
314 Cluster Analysis & Finite Mixture Models EMCluster EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution EM algorithms and several efficient initialization methods for model-based clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised learning.
315 Cluster Analysis & Finite Mixture Models evclust Evidential Clustering Various clustering algorithms that produce a credal partition, i.e., a set of Dempster-Shafer mass functions representing the membership of objects to clusters. The mass functions quantify the cluster-membership uncertainty of the objects. The algorithms are: Evidential c-Means (ECM), Relational Evidential c-Means (RECM), Constrained Evidential c-Means (CECM), EVCLUS and EK-NNclus.
316 Cluster Analysis & Finite Mixture Models FactoClass Combination of Factorial Methods and Cluster Analysis Some functions of ‘ade4’ and ‘stats’ are combined in order to obtain a partition of the rows of a data table, with columns representing variables of scales: quantitative, qualitative or frequency. First, a principal axes method is performed and then, a combination of Ward agglomerative hierarchical classification and K-means is performed, using some of the first coordinates obtained from the previous principal axes method. See, for example: Lebart, L. and Piron, M. and Morineau, A. (2006). Statistique Exploratoire Multidimensionnelle, Dunod, Paris. In order to permit to have different weights of the elements to be clustered, the function ‘kmeansW’, programmed in C++, is included. It is a modification of ‘kmeans’. Some graphical functions include the option: ‘gg=FALSE’. When ‘gg=TRUE’, they use the ‘ggplot2’ and ‘ggrepel’ packages to avoid the super-position of the labels.
317 Cluster Analysis & Finite Mixture Models fastcluster Fast Hierarchical Clustering Routines for R and ‘Python’ This is a two-in-one package which provides interfaces to both R and ‘Python’. It implements fast hierarchical, agglomerative clustering routines. Part of the functionality is designed as drop-in replacement for existing routines: linkage() in the ‘SciPy’ package ‘scipy.cluster.hierarchy’, hclust() in R’s ‘stats’ package, and the ‘flashClust’ package. It provides the same functionality with the benefit of a much faster implementation. Moreover, there are memory-saving routines for clustering of vector data, which go beyond what the existing packages provide. For information on how to install the ‘Python’ files, see the file INSTALL in the source distribution. Based on the present package, Christoph Dalitz also wrote a pure ‘C++’ interface to ‘fastcluster’: <http://informatik.hsnr.de/~dalitz/data/hclust>.
318 Cluster Analysis & Finite Mixture Models fclust Fuzzy Clustering Algorithms for fuzzy clustering, cluster validity indices and plots for cluster validity and visualizing fuzzy clustering results.
319 Cluster Analysis & Finite Mixture Models flashClust Implementation of optimal hierarchical clustering Fast implementation of hierarchical clustering
320 Cluster Analysis & Finite Mixture Models flexclust (core) Flexible Cluster Algorithms The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, …), and bootstrap methods for the analysis of cluster stability.
321 Cluster Analysis & Finite Mixture Models flexmix (core) Flexible Mixture Modeling A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
322 Cluster Analysis & Finite Mixture Models fpc Flexible Procedures for Clustering Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther’s prediction strength, Fang and Wang’s bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.
323 Cluster Analysis & Finite Mixture Models FunCluster Functional Profiling of Microarray Expression Data FunCluster performs a functional analysis of microarray expression data based on Gene Ontology & KEGG functional annotations. From expression data and functional annotations FunCluster builds classes of putatively co-regulated biological processes through a specially designed clustering procedure.
324 Cluster Analysis & Finite Mixture Models funFEM Clustering in the Discriminative Functional Subspace The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.
325 Cluster Analysis & Finite Mixture Models funHDDC Univariate and Multivariate Model-Based Clustering in Group-Specific Functional Subspaces The funHDDC algorithm allows to cluster functional univariate (Bouveyron and Jacques, 2011, <doi:10.1007/s11634-011-0095-6>) or multivariate data (Schmutz et al., 2018) by modeling each group within a specific functional subspace.
326 Cluster Analysis & Finite Mixture Models gamlss.mx Fitting Mixture Distributions with GAMLSS The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.
327 Cluster Analysis & Finite Mixture Models genie A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed, see (Gagolewski et al. 2016a <doi:10.1016/j.ins.2016.05.003>, 2016b <doi:10.1007/978-3-319-45656-0_16>) for more details.
328 Cluster Analysis & Finite Mixture Models GLDEX Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.
329 Cluster Analysis & Finite Mixture Models GMCM Fast Estimation of Gaussian Mixture Copula Models Unsupervised Clustering and Meta-analysis using Gaussian Mixture Copula Models.
330 Cluster Analysis & Finite Mixture Models GSM Gamma Shape Mixture Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavy-tailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.
331 Cluster Analysis & Finite Mixture Models HDclassif High Dimensional Supervised Classification and Clustering Discriminant analysis and data clustering methods for high dimensional data, based on the assumption that high-dimensional data live in different subspaces with low dimensionality proposing a new parametrization of the Gaussian mixture model which combines the ideas of dimension reduction and constraints on the model.
332 Cluster Analysis & Finite Mixture Models hybridHclust Hybrid Hierarchical Clustering Hybrid hierarchical clustering via mutual clusters. A mutual cluster is a set of points closer to each other than to all other points. Mutual clusters are used to enrich top-down hierarchical clustering.
333 Cluster Analysis & Finite Mixture Models idendr0 Interactive Dendrograms Interactive dendrogram that enables the user to select and color clusters, to zoom and pan the dendrogram, and to visualize the clustered data not only in a built-in heat map, but also in ‘GGobi’ interactive plots and user-supplied plots. This is a backport of Qt-based ‘idendro’ (<https://github.com/tsieger/idendro>) to base R graphics and Tcl/Tk GUI.
334 Cluster Analysis & Finite Mixture Models IMIFA Infinite Mixtures of Infinite Factor Analysers and Related Models Provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametrically clustering high-dimensional data, introduced by Murphy et al. (2018) <arXiv:1701.07010v4>. The IMIFA model conducts Bayesian nonparametric model-based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors, mostly via efficient Gibbs updates. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.
335 Cluster Analysis & Finite Mixture Models isopam Isopam (Clustering) Isopam clustering algorithm and utilities. Isopam optimizes clusters and optionally cluster numbers in a brute force style and aims at an optimum separation by all or some descriptors (typically species).
336 Cluster Analysis & Finite Mixture Models kernlab Kernel-Based Machine Learning Lab Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
337 Cluster Analysis & Finite Mixture Models kml K-Means for Longitudinal Data An implementation of k-means specifically design to cluster longitudinal data. It provides facilities to deal with missing value, compute several quality criterion (Calinski and Harabatz, Ray and Turie, Davies and Bouldin, BIC, …) and propose a graphical interface for choosing the ‘best’ number of clusters.
338 Cluster Analysis & Finite Mixture Models latentnet Latent Position and Cluster Models for Statistical Networks Fit and simulate latent position and cluster models for statistical networks.
339 Cluster Analysis & Finite Mixture Models LCAvarsel Variable Selection for Latent Class Analysis Variable selection for latent class analysis for model-based clustering of multivariate categorical data. The package implements a general framework for selecting the subset of variables with relevant clustering information and discard those that are redundant and/or not informative. The variable selection method is based on the approach of Fop et al. (2017) <doi:10.1214/17-AOAS1061> and Dean and Raftery (2010) <doi:10.1007/s10463-009-0258-9>. Different algorithms are available to perform the selection: stepwise, swap-stepwise and evolutionary stochastic search. Concomitant covariates used to predict the class membership probabilities can also be included in the latent class analysis model. The selection procedure can be run in parallel on multiple cores machines.
340 Cluster Analysis & Finite Mixture Models lcmm Extended Mixed Models Using Latent Classes and Latent Processes Estimation of various extensions of the mixed models including latent class mixed models, joint latent latent class mixed models and mixed models for curvilinear univariate or multivariate longitudinal outcomes using a maximum likelihood estimation method.
341 Cluster Analysis & Finite Mixture Models mcclust Process an MCMC Sample of Clusterings Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model. Among them are methods that find a single best clustering to represent the sample, which are based on the posterior similarity matrix or a relabelling algorithm.
342 Cluster Analysis & Finite Mixture Models mclust (core) Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
343 Cluster Analysis & Finite Mixture Models MetabolAnalyze Probabilistic Latent Variable Models for Metabolomic Data Fits probabilistic principal components analysis, probabilistic principal components and covariates analysis and mixtures of probabilistic principal components models to metabolomic spectral data.
344 Cluster Analysis & Finite Mixture Models mixAK Multivariate Normal Mixture Models and Mixtures of Generalized Linear Mixed Models Including Model Based Clustering Contains a mixture of statistical methods including the MCMC methods to analyze normal mixtures. Additionally, model based clustering methods are implemented to perform classification based on (multivariate) longitudinal (or otherwise correlated) data. The basis for such clustering is a mixture of multivariate generalized linear mixed models.
345 Cluster Analysis & Finite Mixture Models MixAll Clustering and Classification using Model-Based Mixture Models Algorithms and methods for model-based clustering and classification. It supports various types of data: continuous, categorical and counting and can handle mixed data of these types. It can fit Gaussian (with diagonal covariance structure), gamma, categorical and Poisson models. The algorithms also support missing values. This package can be used as an independent alternative to the (not free) ‘mixtcomp’ software available at <https://massiccc.lille.inria.fr/>.
346 Cluster Analysis & Finite Mixture Models mixdist Finite Mixture Distribution Models Fit finite mixture distribution models to grouped data and conditional data by maximum likelihood using a combination of a Newton-type algorithm and the EM algorithm.
347 Cluster Analysis & Finite Mixture Models mixPHM Mixtures of Proportional Hazard Models Fits multiple variable mixtures of various parametric proportional hazard models using the EM-Algorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing values and censored values are allowed. Independence is assumed over the single variables.
348 Cluster Analysis & Finite Mixture Models mixRasch Mixture Rasch Models with JMLE Estimates Rasch models and mixture Rasch models, including the dichotomous Rasch model, the rating scale model, and the partial credit model.
349 Cluster Analysis & Finite Mixture Models mixreg Functions to Fit Mixtures of Regressions Fits mixtures of (possibly multivariate) regressions (which has been described as doing ANCOVA when you don’t know the levels).
350 Cluster Analysis & Finite Mixture Models MixSim Simulating Data to Study Performance of Clustering Algorithms The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of ‘MixSim’, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.
351 Cluster Analysis & Finite Mixture Models mixsmsn Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions Functions to fit finite mixture of scale mixture of skew-normal (FM-SMSN) distributions.
352 Cluster Analysis & Finite Mixture Models mixtools Tools for Analyzing Finite Mixture Models Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
353 Cluster Analysis & Finite Mixture Models mixture Finite Gaussian Mixture Models for Clustering and Classification An implementation of all 14 Gaussian parsimonious clustering models (GPCMs) for model-based clustering and model-based classification.
354 Cluster Analysis & Finite Mixture Models MOCCA Multi-Objective Optimization for Collecting Cluster Alternatives Provides methods to analyze cluster alternatives based on multi-objective optimization of cluster validation indices. For details see Kraus et al. (2011) <doi:10.1007/s00180-011-0244-6>.
355 Cluster Analysis & Finite Mixture Models MoEClust Gaussian Parsimonious Clustering Models with Covariates and a Noise Component Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2018) <arXiv:1711.05632v2>. This package fits finite Gaussian mixture models with a formula interface for supplying gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated. A greedy forward stepwise search algorithm is provided for identifying the optimal model in terms of the number of components, the GPCM covariance parameterisation, and the subsets of gating/expert network covariates.
356 Cluster Analysis & Finite Mixture Models movMF Mixtures of von Mises-Fisher Distributions Fit and simulate mixtures of von Mises-Fisher distributions.
357 Cluster Analysis & Finite Mixture Models mritc MRI Tissue Classification Various methods for MRI tissue classification.
358 Cluster Analysis & Finite Mixture Models NbClust Determining the Best Number of Clusters in a Data Set It provides 30 indexes for determining the optimal number of clusters in a data set and offers the best clustering scheme from different results to the user.
359 Cluster Analysis & Finite Mixture Models nor1mix Normal aka Gaussian (1-d) Mixture Models (S3 Classes and Methods) Onedimensional Normal (i.e. Gaussian) Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used Marron-Wand densities. Efficient random number generation and graphics. Fitting to data by efficient ML (Maximum Likelihood) or traditional EM estimation.
360 Cluster Analysis & Finite Mixture Models NPflow Bayesian Nonparametrics for Automatic Gating of Flow-Cytometry Data Dirichlet process mixture of multivariate normal, skew normal or skew t-distributions modeling oriented towards flow-cytometry data preprocessing applications.
361 Cluster Analysis & Finite Mixture Models optpart Optimal Partitioning of Similarity Relations Contains a set of algorithms for creating partitions and coverings of objects largely based on operations on (dis)similarity relations (or matrices). There are several iterative re-assignment algorithms optimizing different goodness-of-clustering criteria. In addition, there are covering algorithms ‘clique’ which derives maximal cliques, and ‘maxpact’ which creates a covering of maximally compact sets. Graphical analyses and conversion routines are also included.
362 Cluster Analysis & Finite Mixture Models ORIClust Order-restricted Information Criterion-based Clustering Algorithm ORIClust is a user-friendly R-based software package for gene clustering. Clusters are given by genes matched to prespecified profiles across various ordered treatment groups. It is particularly useful for analyzing data obtained from short time-course or dose-response microarray experiments.
363 Cluster Analysis & Finite Mixture Models pdfCluster Cluster Analysis via Nonparametric Density Estimation Cluster analysis via nonparametric density estimation is performed. Operationally, the kernel method is used throughout to estimate the density. Diagnostics methods for evaluating the quality of the clustering are available. The package includes also a routine to estimate the probability density function obtained by the kernel method, given a set of data with arbitrary dimensions.
364 Cluster Analysis & Finite Mixture Models pmclust Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs ‘pbdMPI’ to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through ‘pbdMPI’ and MPI’ implementations such as ‘OpenMPI’ and ‘MPICH’. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.
365 Cluster Analysis & Finite Mixture Models poLCA Polytomous variable Latent Class Analysis Latent class analysis and latent class regression models for polytomous outcome variables. Also known as latent structure analysis.
366 Cluster Analysis & Finite Mixture Models prabclus Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview.
367 Cluster Analysis & Finite Mixture Models prcr Person-Centered Analysis Provides an easy-to-use yet adaptable set of tools to conduct person-center analysis using a two-step clustering procedure. As described in Bergman and El-Khouri (1999) <doi:10.1002/(SICI)1521-4036(199910)41:6%3C753::AID-BIMJ753%3E3.0.CO;2-K>, hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure.
368 Cluster Analysis & Finite Mixture Models PReMiuM Dirichlet Process Bayesian Clustering, Profile Regression Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.
369 Cluster Analysis & Finite Mixture Models profdpm Profile Dirichlet Process Mixtures This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.
370 Cluster Analysis & Finite Mixture Models protoclust Hierarchical Clustering with Prototypes Performs minimax linkage hierarchical clustering. Every cluster has an associated prototype element that represents that cluster as described in Bien, J., and Tibshirani, R. (2011), “Hierarchical Clustering with Prototypes via Minimax Linkage,” The Journal of the American Statistical Association, 106(495), 1075-1084.
371 Cluster Analysis & Finite Mixture Models psychomix Psychometric Mixture Models Psychometric mixture models based on ‘flexmix’ infrastructure. At the moment Rasch mixture models with different parameterizations of the score distribution (saturated vs. mean/variance specification), Bradley-Terry mixture models, and MPT mixture models are implemented. These mixture models can be estimated with or without concomitant variables. See vignette(‘raschmix’, package = ‘psychomix’) for details on the Rasch mixture models.
372 Cluster Analysis & Finite Mixture Models pvclust Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) p-value as well as BP (bootstrap probability) value for each cluster in a dendrogram.
373 Cluster Analysis & Finite Mixture Models randomLCA Random Effects Latent Class Analysis Fits standard and random effects latent class models. The single level random effects model is described in Qu et al <doi:10.2307/2533043> and the two level random effects model in Beath and Heller <doi:10.1177/1471082X0800900302>. Examples are given for their use in diagnostic testing.
374 Cluster Analysis & Finite Mixture Models rebmix Finite Mixture Modeling, Clustering & Classification R functions for random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, binomial, Poisson, Dirac or circular von Mises parametric families.
375 Cluster Analysis & Finite Mixture Models rjags Bayesian Graphical Models using MCMC Interface to the JAGS MCMC library.
376 Cluster Analysis & Finite Mixture Models Rmixmod (core) Classification with Mixture Modelling Interface of ‘MIXMOD’ software for supervised, unsupervised and semi-supervised classification with mixture modelling.
377 Cluster Analysis & Finite Mixture Models RPMM Recursively Partitioned Mixture Model Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.
378 Cluster Analysis & Finite Mixture Models seriation Infrastructure for Ordering Objects Using Seriation Infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).
379 Cluster Analysis & Finite Mixture Models sigclust Statistical Significance of Clustering SigClust is a statistical method for testing the significance of clustering results. SigClust can be applied to assess the statistical significance of splitting a data set into two clusters. For more than two clusters, SigClust can be used iteratively.
380 Cluster Analysis & Finite Mixture Models skmeans Spherical k-Means Clustering Algorithms to compute spherical k-means partitions. Features several methods, including a genetic and a fixed-point algorithm and an interface to the CLUTO vcluster program.
381 Cluster Analysis & Finite Mixture Models som Self-Organizing Map Self-Organizing Map (with application in gene clustering).
382 Cluster Analysis & Finite Mixture Models somspace Spatial Analysis with Self-Organizing Maps Application of the Self-Organizing Maps technique for spatial classification of time series. The package uses spatial data, point or gridded, to create clusters with similar characteristics. The clusters can be further refined to a smaller number of regions by hierarchical clustering and their spatial dependencies can be presented as complex networks. Thus, meaningful maps can be created, representing the regional heterogeneity of a single variable. More information and an example of implementation can be found in Markonis and Strnad (2019).
383 Cluster Analysis & Finite Mixture Models Spectrum Fast Adaptive Spectral Clustering for Single and Multi-View Data A self-tuning spectral clustering method for single or multi-view data. ‘Spectrum’ uses a new type of adaptive density aware kernel that strengthens connections in the graph based on common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to integrate different data sources and reduce noise. ‘Spectrum’ uses either the eigengap or multimodality gap heuristics to determine the number of clusters. The method is sufficiently flexible so that a wide range of Gaussian and non-Gaussian structures can be clustered with automatic selection of K.
384 Cluster Analysis & Finite Mixture Models tclust Robust Trimmed Clustering Provides functions for robust trimmed clustering. The methods are described in Garcia-Escudero (2008) <doi:10.1214/07-AOS515>, Fritz et al. (2012) <doi:10.18637/jss.v047.i12> and others.
385 Cluster Analysis & Finite Mixture Models teigen Model-Based Clustering and Classification with the Multivariate t Distribution Fits mixtures of multivariate t-distributions (with eigen-decomposed covariance structure) via the expectation conditional-maximization algorithm under a clustering or classification paradigm.
386 Cluster Analysis & Finite Mixture Models treeClust Cluster Distances Through Trees Create a measure of inter-point dissimilarity useful for clustering mixed data, and, optionally, perform the clustering.
387 Cluster Analysis & Finite Mixture Models trimcluster Cluster Analysis with Trimming Trimmed k-means clustering.
388 Cluster Analysis & Finite Mixture Models VarSelLCM Variable Selection for Model-Based Clustering of Mixed-Type Data Set with Missing Values Full model selection (detection of the relevant features and estimation of the number of clusters) for model-based clustering (see reference here <doi:10.1007/s11222-016-9670-1>). Data to analyze can be continuous, categorical, integer or mixed. Moreover, missing values can occur and do not necessitate any pre-processing. Shiny application permits an easy interpretation of the results.
389 Databases with R bigrquery An Interface to Google’s ‘BigQuery’ ‘API’ Easily talk to Google’s ‘BigQuery’ database from R.
390 Databases with R dbfaker A Tool to Ensure the Validity of Database Writes A tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support.
391 Databases with R DBI (core) R Database Interface A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.
392 Databases with R DBItest Testing ‘DBI’ Back Ends A helper that tests ‘DBI’ back ends for conformity to the interface.
393 Databases with R dbplyr A ‘dplyr’ Back End for Databases A ‘dplyr’ back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a ‘DBI’ back end; more advanced features require ‘SQL’ translation to be provided by the package author.
394 Databases with R elastic General Purpose Interface to ‘Elasticsearch’ Connect to ‘Elasticsearch’, a ‘NoSQL’ database built on the ‘Java’ Virtual Machine. Interacts with the ‘Elasticsearch’ ‘HTTP’ API (<https://www.elastic.co/products/elasticsearch>), including functions for setting connection details to ‘Elasticsearch’ instances, loading bulk data, searching for documents with both ‘HTTP’ query variables and ‘JSON’ based body requests. In addition, ‘elastic’ provides functions for interacting with API’s for ‘indices’, documents, nodes, clusters, an interface to the cat API, and more.
395 Databases with R filehashSQLite Simple key-value database using SQLite Simple key-value database using SQLite as the backend
396 Databases with R implyr R Interface for Apache Impala ‘SQL’ back-end to ‘dplyr’ for Apache Impala, the massively parallel processing query engine for Apache ‘Hadoop’. Impala enables low-latency ‘SQL’ queries on data stored in the ‘Hadoop’ Distributed File System ‘(HDFS)’, Apache ‘HBase’, Apache ‘Kudu’, Amazon Simple Storage Service ‘(S3)’, Microsoft Azure Data Lake Store ‘(ADLS)’, and Dell ‘EMC’ ‘Isilon’. See <https://impala.apache.org> for more information about Impala.
397 Databases with R influxdbr R Interface to InfluxDB An R interface to the InfluxDB time series database <https://www.influxdata.com>. This package allows you to fetch and write time series data from/to an InfluxDB server. Additionally, handy wrappers for the Influx Query Language (IQL) to manage and explore a remote database are provided.
398 Databases with R liteq Lightweight Portable Message Queue Using ‘SQLite’ Temporary and permanent message queues for R. Built on top of ‘SQLite’ databases. ‘SQLite’ provides locking, and makes it possible to detect crashed consumers. Crashed jobs can be automatically marked as “failed”, or put in the queue again, potentially a limited number of times.
399 Databases with R mongolite Fast and Simple ‘MongoDB’ Client for R High-performance MongoDB client based on ‘mongo-c-driver’ and ‘jsonlite’. Includes support for aggregation, indexing, map-reduce, streaming, encryption, enterprise authentication, and GridFS. The online user manual provides an overview of the available methods in the package: <https://jeroen.github.io/mongolite/>.
400 Databases with R odbc (core) Connect to ODBC Compatible Databases (using the DBI Interface) A DBI-compatible interface to ODBC databases.
401 Databases with R ora Convenient Tools for Working with Oracle Databases Easy-to-use functions to explore Oracle databases and import data into R. User interface for the ROracle package.
402 Databases with R pointblank Validation of Local and Remote Data Tables Validate data in data frames, ‘tibble’ objects, in ‘CSV’ and ‘TSV’ files, and in database tables (‘PostgreSQL’ and ‘MySQL’). Validation pipelines can be made using easily-readable, consecutive validation steps and such pipelines allow for switching of the data table context. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions.
403 Databases with R pool Object Pooling Enables the creation of object pools, which make it less computationally expensive to fetch a new object. Currently the only supported pooled objects are ‘DBI’ connections.
404 Databases with R R4CouchDB A R Convenience Layer for CouchDB 2.0 Provides a collection of functions for basic database and document management operations such as add, get, list access or delete. Every cdbFunction() gets and returns a list() containing the connection setup. Such a list can be generated by cdbIni().
405 Databases with R RCassandra R/Cassandra interface This packages provides a direct interface (without the use of Java) to the most basic functionality of Apache Cassanda such as login, updates and queries.
406 Databases with R RcppRedis ‘Rcpp’ Bindings for ‘Redis’ using the ‘hiredis’ Library Connection to the ‘Redis’ key/value store using the C-language client library ‘hiredis’ (included as a fallback) with ‘MsgPack’ encoding provided via ‘RcppMsgPack’ headers.
407 Databases with R redux R Bindings to ‘hiredis’ A ‘hiredis’ wrapper that includes support for transactions, pipelining, blocking subscription, serialisation of all keys and values, ‘Redis’ error handling with R errors. Includes an automatically generated ‘R6’ interface to the full ‘hiredis’ ‘API’. Generated functions are faithful to the ‘hiredis’ documentation while attempting to match R’s argument semantics. Serialisation must be explicitly done by the user, but both binary and text-mode serialisation is supported.
408 Databases with R RGreenplum Interface to ‘Greenplum’ Database Fully ‘DBI’-compliant interface to ‘Greenplum’ <https://greenplum.org/>, an open-source parallel database. This is an extension of the ‘RPostgres’ package <https://github.com/r-dbi/RPostgres>.
409 Databases with R RH2 DBI/RJDBC Interface to H2 Database DBI/RJDBC interface to h2 database. h2 version 1.3.175 is included.
410 Databases with R RJDBC Provides Access to Databases Through the JDBC Interface The RJDBC package is an implementation of R’s DBI interface using JDBC as a back-end. This allows R to connect to any DBMS that has a JDBC driver.
411 Databases with R RMariaDB Database Interface and ‘MariaDB’ Driver Implements a ‘DBI’-compliant interface to ‘MariaDB’ (<https://mariadb.org/>) and ‘MySQL’ (<https://www.mysql.com/>) databases.
412 Databases with R RMySQL Database Interface and ‘MySQL’ Driver for R Legacy ‘DBI’ interface to ‘MySQL’ / ‘MariaDB’ based on old code ported from S-PLUS. A modern ‘MySQL’ client based on ‘Rcpp’ is available from the ‘RMariaDB’ package.
413 Databases with R RODBC ODBC Database Access An ODBC database interface.
414 Databases with R ROracle OCI Based Oracle Database Interface for R Oracle Database interface (DBI) driver for R. This is a DBI-compliant Oracle driver based on the OCI.
415 Databases with R rpostgis R Interface to a ‘PostGIS’ Database Provides an interface between R and ‘PostGIS’-enabled ‘PostgreSQL’ databases to transparently transfer spatial data. Both vector (points, lines, polygons) and raster data are supported in read and write modes. Also provides convenience functions to execute common procedures in ‘PostgreSQL/PostGIS’.
416 Databases with R RPostgres ‘Rcpp’ Interface to ‘PostgreSQL’ Fully ‘DBI’-compliant ‘Rcpp’-backed interface to ‘PostgreSQL’ <https://www.postgresql.org/>, an open-source relational database.
417 Databases with R RPostgreSQL R Interface to the ‘PostgreSQL’ Database System Database interface and ‘PostgreSQL’ driver for ‘R’. This package provides a Database Interface ‘DBI’ compliant driver for ‘R’ to access ‘PostgreSQL’ database systems. In order to build and install this package from source, ‘PostgreSQL’ itself must be present your system to provide ‘PostgreSQL’ functionality via its libraries and header files. These files are provided as ‘postgresql-devel’ package under some Linux distributions. On ‘macOS’ and ‘Microsoft Windows’ system the attached ‘libpq’ library source will be used.
418 Databases with R RPresto DBI Connector to Presto Implements a ‘DBI’ compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>.
419 Databases with R RSQLite ‘SQLite’ Interface for R Embeds the ‘SQLite’ database engine in R and provides an interface compliant with the ‘DBI’ package. The source for the ‘SQLite’ engine is included.
420 Databases with R sqldf Manipulate R Data Frames Using SQL The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. ‘RSQLite’, ‘RH2’, ‘RMySQL’ and ‘RPostgreSQL’ backends are supported.
421 Databases with R TScompare ‘TSdbi’ Database Comparison Utilities for comparing the equality of series on two databases. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.
422 Databases with R uptasticsearch Get Data Frame Representations of ‘Elasticsearch’ Results ‘Elasticsearch’ is an open-source, distributed, document-based datastore (<https://www.elastic.co/products/elasticsearch>). It provides an ‘HTTP’ ‘API’ for querying the database and extracting datasets, but that ‘API’ was not designed for common data science workflows like pulling large batches of records and normalizing those documents into a data frame that can be used as a training dataset for statistical models. ‘uptasticsearch’ provides an interface for ‘Elasticsearch’ that is explicitly designed to make these data science workflows easy and fun.
423 Differential Equations adaptivetau Tau-Leaping Stochastic Simulation Implements adaptive tau leaping to approximate the trajectory of a continuous-time stochastic process as described by Cao et al. (2007) The Journal of Chemical Physics <doi:10.1063/1.2745299> (aka. the Gillespie stochastic simulation algorithm). This package is based upon work supported by NSF DBI-0906041 and NIH K99-GM104158 to Philip Johnson and NIH R01-AI049334 to Rustom Antia.
424 Differential Equations bvpSolve (core) Solvers for Boundary Value Problems of Differential Equations Functions that solve boundary value problems (‘BVP’) of systems of ordinary differential equations (‘ODE’) and differential algebraic equations (‘DAE’). The functions provide an interface to the FORTRAN functions ‘twpbvpC’, ‘colnew/colsys’, and an R-implementation of the shooting method.
425 Differential Equations cOde Automated C Code Generation for ‘deSolve’, ‘bvpSolve’ Generates all necessary C functions allowing the user to work with the compiled-code interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. Also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis.
426 Differential Equations CollocInfer Collocation Inference for Dynamic Systems These functions implement collocation-inference for continuous-time and discrete-time stochastic processes. They provide model-based smoothing, gradient-matching, generalized profiling and forwards prediction error methods.
427 Differential Equations dde Solve Delay Differential Equations Solves ordinary and delay differential equations, where the objective function is written in either R or C. Suitable only for non-stiff equations, the solver uses a ‘Dormand-Prince’ method that allows interpolation of the solution at any point. This approach is as described by Hairer, Norsett and Wanner (1993) <ISBN:3540604529>. Support is also included for iterating difference equations.
428 Differential Equations deSolve (core) Solvers for Initial Value Problems of Differential Equations (‘ODE’, ‘DAE’, ‘DDE’) Functions that solve initial value problems of a system of first-order ordinary differential equations (‘ODE’), of partial differential equations (‘PDE’), of differential algebraic equations (‘DAE’), and of delay differential equations. The functions provide an interface to the FORTRAN functions ‘lsoda’, ‘lsodar’, ‘lsode’, ‘lsodes’ of the ‘ODEPACK’ collection, to the FORTRAN functions ‘dvode’, ‘zvode’ and ‘daspk’ and a C-implementation of solvers of the ‘Runge-Kutta’ family with fixed or variable time steps. The package contains routines designed for solving ‘ODEs’ resulting from 1-D, 2-D and 3-D partial differential equations (‘PDE’) that have been converted to ‘ODEs’ by numerical differencing.
429 Differential Equations deTestSet Test Set for Differential Equations Solvers and test set for stiff and non-stiff differential equations, and differential algebraic equations. ‘Mazzia, F., Cash, J.R. and K. Soetaert, 2012. DOI: 10.1016/j.cam.2012.03.014’.
430 Differential Equations diffeqr Solving Differential Equations (ODEs, SDEs, DDEs, DAEs) An interface to ‘DifferentialEquations.jl’ <http://docs.juliadiffeq.org/latest/> from the R programming language. It has unique high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE), delay differential equations (DDE), differential-algebraic equations (DAE), and more. Much of the functionality, including features like adaptive time stepping in SDEs, are unique and allow for multiple orders of magnitude speedup over more common methods. ‘diffeqr’ attaches an R interface onto the package, allowing seamless use of this tooling by R users.
431 Differential Equations dMod Dynamic Modeling and Parameter Estimation in ODE Models The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.
432 Differential Equations ecolMod “A practical guide to ecological modelling - using R as a simulation platform” Figures, data sets and examples from the book “A practical guide to ecological modelling - using R as a simulation platform” by Karline Soetaert and Peter MJ Herman (2009). Springer. All figures from chapter x can be generated by “demo(chapx)”, where x = 1 to 11. The R-scripts of the model examples discussed in the book are in subdirectory “examples”, ordered per chapter. Solutions to model projects are in the same subdirectories.
433 Differential Equations FME A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steady-state solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.
434 Differential Equations GillespieSSA Gillespie’s Stochastic Simulation Algorithm (SSA) Provides a simple to use, intuitive, and extensible interface to several stochastic simulation algorithms for generating simulated trajectories of finite population continuous-time model. Currently it implements Gillespie’s exact stochastic simulation algorithm (Direct method) and several approximate methods (Explicit tau-leap, Binomial tau-leap, and Optimized tau-leap). The package also contains a library of template models that can be run as demo models and can easily be customized and extended. Currently the following models are included, ‘Decaying-Dimerization’ reaction set, linear chain system, logistic growth model, ‘Lotka’ predator-prey model, Rosenzweig-MacArthur predator-prey model, ‘Kermack-McKendrick’ SIR model, and a ‘metapopulation’ SIRS model. Pineda-Krch et al. (2008) <doi:10.18637/jss.v025.i12>.
435 Differential Equations mkin Kinetic Evaluation of Chemical Degradation Data Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers and a choice of the optimisation methods made available by the ‘FME’ package. If a C compiler (on windows: ‘Rtools’) is installed, differential equation models are solved using compiled C functions. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.
436 Differential Equations nlmeODE Non-linear mixed-effects modelling in nlme using differential equations This package combines the odesolve and nlme packages for mixed-effects modelling using differential equations.
437 Differential Equations odeintr C++ ODE Solvers Compiled on-Demand Wraps the Boost odeint library for integration of differential equations.
438 Differential Equations PBSddesolve Solver for Delay Differential Equations Routines for solving systems of delay differential equations by interfacing numerical routines written by Simon N. Wood , with contributions by Benjamin J. Cairns. These numerical routines first appeared in Simon Wood’s ‘solv95’ program. This package includes a vignette and a complete user’s guide. ‘PBSddesolve’ originally appeared on CRAN under the name ‘ddesolve’. That version is no longer supported. The current name emphasizes a close association with other PBS packages, particularly ‘PBSmodelling’.
439 Differential Equations PBSmodelling GUI Tools Made Easy: Interact with Models and Explore Data Provides software to facilitate the design, testing, and operation of computer models. It focuses particularly on tools that make it easy to construct and edit a customized graphical user interface (‘GUI’). Although our simplified ‘GUI’ language depends heavily on the R interface to the ‘Tcl/Tk’ package, a user does not need to know ‘Tcl/Tk’. Examples illustrate models built with other R packages, including ‘PBSmapping’, ‘PBSddesolve’, and ‘BRugs’. A complete user’s guide ‘PBSmodelling-UG.pdf’ shows how to use this package effectively.
440 Differential Equations pomp Statistical Inference for Partially Observed Markov Processes Tools for data analysis with partially observed Markov process (POMP) models (also known as stochastic dynamical systems, hidden Markov models, and nonlinear, non-Gaussian, state-space models). The package provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a versatile platform for implementation of inference methods for general POMP models.
441 Differential Equations pracma Practical Numerical Math Functions Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some well-known special mathematical functions. Uses ‘MATLAB’ function names where appropriate to simplify porting.
442 Differential Equations ReacTran Reactive Transport Modelling in 1d, 2d and 3d Routines for developing models that describe reaction and advective-diffusive transport in one, two or three dimensions. Includes transport routines in porous media, in estuaries, and in bodies with variable shape.
443 Differential Equations rODE Ordinary Differential Equation (ODE) Solvers Written in R Using S4 Classes Show physics, math and engineering students how an ODE solver is made and how effective R classes can be for the construction of the equations that describe natural phenomena. Inspiration for this work comes from the book on “Computer Simulations in Physics” by Harvey Gould, Jan Tobochnik, and Wolfgang Christian. Book link: <http://www.compadre.org/osp/items/detail.cfm?ID=7375>.
444 Differential Equations rodeo A Code Generator for ODE-Based Models Provides an R6 class and several utility methods to facilitate the implementation of models based on ordinary differential equations. The heart of the package is a code generator that creates compiled ‘Fortran’ (or ‘R’) code which can be passed to a numerical solver. There is direct support for solvers contained in packages ‘deSolve’ and ‘rootSolve’.
445 Differential Equations rootSolve (core) Nonlinear Root Finding, Equilibrium and Steady-State Analysis of Ordinary Differential Equations Routines to find the root of nonlinear functions, and to perform steady-state and equilibrium analysis of ordinary differential equations (ODE). Includes routines that: (1) generate gradient and jacobian matrices (full and banded), (2) find roots of non-linear equations by the ‘Newton-Raphson’ method, (3) estimate steady-state conditions of a system of (differential) equations in full, banded or sparse form, using the ‘Newton-Raphson’ method, or by dynamically running, (4) solve the steady-state conditions for uni-and multicomponent 1-D, 2-D, and 3-D partial differential equations, that have been converted to ordinary differential equations by numerical differencing (using the method-of-lines approach). Includes fortran code.
446 Differential Equations scaRabee Optimization Toolkit for Pharmacokinetic-Pharmacodynamic Models scaRabee is a port of the Scarabee toolkit originally written as a Matlab-based application. It provides a framework for simulation and optimization of pharmacokinetic-pharmacodynamic models at the individual and population level. It is built on top of the neldermead package, which provides the direct search algorithm proposed by Nelder and Mead for model optimization.
447 Differential Equations sde (core) Simulation and Inference for Stochastic Differential Equations Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 978-0-387-75838-1, Springer, NY.
448 Differential Equations Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
449 Differential Equations simecol Simulation of Ecological (and Other) Dynamic Systems An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individual-based (or agent-based) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and re-usability of code.
450 Differential Equations sundialr An Interface to ‘SUNDIALS’ Ordinary Differential Equation (ODE) Solvers Provides a way to call the functions in ‘SUNDIALS’ C ODE solving library (<https://computation.llnl.gov/projects/sundials>). Currently the serial version of ODE solver, ‘CVODE’ and sensitivity calculator ‘CVODES’ from the ‘SUNDIALS’ library are implemented. The package requires ODE to be written as an ‘R’ or ‘Rcpp’ function and does not require the ‘SUNDIALS’ library to be installed on the local machine.
451 Probability Distributions actuar (core) Actuarial Functions and Heavy Tailed Distributions Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poisson-inverse Gaussian discrete distribution; zero-truncated and zero-modified extensions of the standard discrete distributions. Support for phase-type distributions commonly used to compute ruin probabilities.
452 Probability Distributions AdMit Adaptive Mixture of Student-t Distributions Provides functions to perform the fitting of an adaptive mixture of Student-t distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the Metropolis-Hastings algorithm to obtain quantities of interest for the target density itself.
453 Probability Distributions agricolae Statistical Procedures for Agricultural Research Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
454 Probability Distributions ald The Asymmetric Laplace Distribution It provides the density, distribution function, quantile function, random number generator, likelihood function, moments and Maximum Likelihood estimators for a given sample, all this for the three parameter Asymmetric Laplace Distribution defined in Koenker and Machado (1999). This is a special case of the skewed family of distributions available in Galarza et.al. (2017) <doi:10.1002/sta4.140> useful for quantile regression.
455 Probability Distributions AtelieR A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (Central-Limit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).
456 Probability Distributions bayesm Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
457 Probability Distributions benchden 28 benchmark densities from Berlinet/Devroye (1994) Full implementation of the 28 distributions introduced as benchmarks for nonparametric density estimation by Berlinet and Devroye (1994). Includes densities, cdfs, quantile functions and generators for samples as well as additional information on features of the densities. Also contains the 4 histogram densities used in Rozenholc/Mildenberger/Gather (2010).
458 Probability Distributions BenfordTests Statistical Tests for Evaluating Conformity to Benford’s Law Several specialized statistical tests and support functions for determining if numerical data could conform to Benford’s law.
459 Probability Distributions BiasedUrn Biased Urn Model Distributions Statistical models of biased sampling in the form of univariate and multivariate noncentral hypergeometric distributions, including Wallenius’ noncentral hypergeometric distribution and Fisher’s noncentral hypergeometric distribution (also called extended hypergeometric distribution). See vignette(“UrnTheory”) for explanation of these distributions.
460 Probability Distributions bivariate Bivariate Probability Distributions Contains convenience functions for constructing, plotting and evaluating bivariate probability distributions, including their probability mass functions, probability density functions and cumulative distribution functions. Supports uniform (discrete and continuous), binomial, Poisson, categorical, normal, bimodal and Dirichlet (trivariate) distributions, and kernel smoothing and empirical cumulative distribution functions.
461 Probability Distributions Bivariate.Pareto Bivariate Pareto Models Perform competing risks analysis under bivariate Pareto models. See Shih et al. (2018) <doi:10.1080/03610926.2018.1425450> for details.
462 Probability Distributions BivarP Estimating the Parameters of Some Bivariate Distributions Parameter estimation of bivariate distribution functions modeled as a Archimedean copula function. The input data may contain values from right censored. Used marginal distributions are two-parameter. Methods for density, distribution, survival, random sample generation.
463 Probability Distributions BivGeo Basu-Dhar Bivariate Geometric Distribution Computes the joint probability mass function (pmf), the joint cumulative function (cdf), the joint survival function (sf), the correlation coefficient, the covariance, the cross-factorial moment and generate random deviates for the Basu-Dhar bivariate geometric distribution as well the joint probability mass, cumulative and survival function assuming the presence of a cure fraction given by the standard bivariate mixture cure fraction model. The package also computes the estimators based on the method of moments.
464 Probability Distributions bivgeom Roy’s Bivariate Geometric Distribution Implements Roy’s bivariate geometric model (Roy (1993) <doi:10.1006/jmva.1993.1065>): joint probability mass function, distribution function, survival function, random generation, parameter estimation, and more.
465 Probability Distributions bmixture Bayesian Estimation for Finite Mixture of Distributions Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s00180-012-0323-3> and Mohammadi and Salehi-Rad (2012) <doi:10.1080/03610918.2011.588358>.
466 Probability Distributions BMT The BMT Distribution Density, distribution, quantile function, random number generation for the BMT (Bezier-Montenegro-Torres) distribution. Torres-Jimenez C.J. and Montenegro-Diaz A.M. (2017) <arXiv:1709.05534>. Moments, descriptive measures and parameter conversion for different parameterizations of the BMT distribution. Fit of the BMT distribution to non-censored data by maximum likelihood, moment matching, quantile matching, maximum goodness-of-fit, also known as minimum distance, maximum product of spacing, also called maximum spacing, and minimum quantile distance, which can also be called maximum quantile goodness-of-fit. Fit of univariate distributions for non-censored data using maximum product of spacing estimation and minimum quantile distance estimation is also included.
467 Probability Distributions bridgedist An Implementation of the Bridge Distribution with Logit-Link as in Wang and Louis (2003) An implementation of the bridge distribution with logit-link in R. In Wang and Louis (2003) <doi:10.1093/biomet/90.4.765>, such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian.
468 Probability Distributions cbinom Continuous Analog of a Binomial Distribution Implementation of the d/p/q/r family of functions for a continuous analog to the standard discrete binomial with continuous size parameter and continuous support with x in [0, size + 1], following Ilienko (2013) <arXiv:1303.5990>.
469 Probability Distributions CDVine Statistical Inference of C- And D-Vine Copulas Functions for statistical inference of canonical vine (C-vine) and D-vine copulas. Tools for bivariate exploratory data analysis and for bivariate as well as vine copula selection are provided. Models can be estimated either sequentially or by joint maximum likelihood estimation. Sampling algorithms and plotting methods are also included. Data is assumed to lie in the unit hypercube (so-called copula data).
470 Probability Distributions CircStats Circular Statistics, from “Topics in Circular Statistics” (2001) Circular Statistics, from “Topics in Circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
471 Probability Distributions circular Circular Statistics Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
472 Probability Distributions cmvnorm The Complex Multivariate Gaussian Distribution Various utilities for the complex multivariate Gaussian distribution.
473 Probability Distributions coga Convolution of Gamma Distributions Evaluation for density and distribution function of convolution of gamma distributions in R. Two related exact methods and one approximate method are implemented with efficient algorithm and C++ code. A quick guide for choosing correct method and usage of this package is given in package vignette.
474 Probability Distributions CompGLM Conway-Maxwell-Poisson GLM and Distribution Functions A function (which uses a similar interface to the ‘glm’ function) for the fitting of a Conway-Maxwell-Poisson GLM. There are also various methods for analysis of the model fit. The package also contains functions for the Conway-Maxwell-Poisson distribution in a similar interface to functions ‘dpois’, ‘ppois’ and ‘rpois’. The functions are generally quick, since the workhorse functions are written in C++ (thanks to the Rcpp package).
475 Probability Distributions CompLognormal Functions for actuarial scientists Computes the probability density function, cumulative distribution function, quantile function, random numbers of any composite model based on the lognormal distribution.
476 Probability Distributions compoisson Conway-Maxwell-Poisson Distribution Provides routines for density and moments of the Conway-Maxwell-Poisson distribution as well as functions for fitting the COM-Poisson model for over/under-dispersed count data.
477 Probability Distributions Compositional Compositional Data Analysis Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. The standard textbook for such data is John Aitchison’s (1986) “The statistical analysis of compositional data”. Relevant papers include a) Tsagris M.T., Preston S. and Wood A.T.A. (2011) A data-based power transformation for compositional data. Fourth International International Workshop on Compositional Data Analysis. b) Tsagris M. (2014). The k-NN algorithm for compositional data: a revised approach with and without zero values present. Journal of Data Science, 12(3):519534. c) Tsagris M. (2015). Regression analysis with compositional data containing zero values. Chilean Journal of Statistics, 6(2):4757. d) Tsagris M., Preston S. and Wood A.T.A. (2016). Improved supervised classification for compositional data using the alpha-transformation. Journal of Classification, 33(2):243261. <doi:10.1007/s00357-016-9207-5>. e) Tsagris M., Preston S. and Wood A.T.A. (2017). Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation, 87(2): 406422. <doi:10.1080/00949655.2016.1216554> f) Tsagris M. and Stewart C. (2018). A Dirichlet regression model for compositional data with zeros. Lobachevskii Journal of Mathematics, 39(3): 398412. <doi:10.1134/S1995080218030198>. g) Alenazi A. (2019). Regression for compositional data with compositional data as predictor variables with or without zero values. Journal of Data Science, 17(1): 219238. <doi:10.6339/JDS.201901_17(1).0010> Furher, we include functions for percentages (or proportions).
478 Probability Distributions compositions Compositional Data Analysis Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.
479 Probability Distributions Compounding Computing Continuous Distributions Computing Continuous Distributions Obtained by Compounding a Continuous and a Discrete Distribution
480 Probability Distributions CompQuadForm Distribution Function of Quadratic Forms in Normal Variables Computes the distribution function of quadratic forms in normal variables using Imhof’s method, Davies’s algorithm, Farebrother’s algorithm or Liu et al.’s algorithm.
481 Probability Distributions condMVNorm Conditional Multivariate Normal Distribution Computes conditional multivariate normal probabilities, random deviates and densities.
482 Probability Distributions copBasic General Bivariate Copula Theory and Many Utility Functions Extensive functions for bivariate copula (bicopula) computations and related operations for bicopula theory. The lower, upper, product, and select other bicopula are implemented along with operations including the diagonal, survival copula, dual of a copula, co-copula, and numerical bicopula density. Level sets, horizontal and vertical sections are supported. Numerical derivatives and inverses of a bicopula are provided through which simulation is implemented. Bicopula composition, convex combination, and products also are provided. Support extends to the Kendall Function as well as the Lmoments thereof. Kendall Tau, Spearman Rho and Footrule, Gini Gamma, Blomqvist Beta, Hoeffding Phi, Schweizer- Wolff Sigma, tail dependency, tail order, skewness, and bivariate Lmoments are implemented, and positive/negative quadrant dependency, left (right) increasing (decreasing) are available. Other features include Kullback-Leibler divergence, Vuong procedure, spectral measure, and Lcomoments for inference, maximum likelihood, and AIC, BIC, and RMSE for goodness-of-fit.
483 Probability Distributions copula (core) Multivariate Dependence with Copulas Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
484 Probability Distributions cpd Complex Pearson Distributions Probability mass function, distribution function, quantile function and random generation for the Complex Triparametric Pearson (CTP) and Complex Biparametric Pearson (CBP) distributions developed by Rodriguez-Avi et al (2003) <doi:10.1007/s00362-002-0134-7>, Rodriguez-Avi et al (2004) <doi:10.1007/BF02778271> and Olmo-Jimenez et al (2018) <doi:10.1080/00949655.2018.1482897>. The package also contains maximum-likelihood fitting functions for these models.
485 Probability Distributions csn Closed Skew-Normal Distribution Provides functions for computing the density and the log-likelihood function of closed-skew normal variates, and for generating random vectors sampled from this distribution. See Gonzalez-Farias, G., Dominguez-Molina, J., and Gupta, A. (2004). The closed skew normal distribution, Skew-elliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 25-42.
486 Probability Distributions Davies The Davies Quantile Function Various utilities for the Davies distribution.
487 Probability Distributions degreenet Models for Skewed Count Distributions Relevant to Networks Likelihood-based inference for skewed count distributions used in network modeling. “degreenet” is a part of the “statnet” suite of packages for network analysis.
488 Probability Distributions Delaporte Statistical Functions for the Delaporte Distribution Provides probability mass, distribution, quantile, random-variate generation, and method-of-moments parameter-estimation functions for the Delaporte distribution. The Delaporte is a discrete probability distribution which can be considered the convolution of a negative binomial distribution with a Poisson distribution. Alternatively, it can be considered a counting distribution with both Poisson and negative binomial components. It has been studied in actuarial science as a frequency distribution which has more variability than the Poisson, but less than the negative binomial.
489 Probability Distributions dirmult Estimation in Dirichlet-Multinomial distribution Estimate parameters in Dirichlet-Multinomial and compute profile log-likelihoods.
490 Probability Distributions disclap Discrete Laplace Exponential Family Discrete Laplace exponential family for models such as a generalized linear model
491 Probability Distributions DiscreteInverseWeibull Discrete Inverse Weibull Distribution Probability mass function, distribution function, quantile function, random generation and parameter estimation for the discrete inverse Weibull distribution.
492 Probability Distributions DiscreteLaplace Discrete Laplace Distributions Probability mass function, distribution function, quantile function, random generation and estimation for the skew discrete Laplace distributions.
493 Probability Distributions DiscreteWeibull Discrete Weibull Distributions (Type 1 and 3) Probability mass function, distribution function, quantile function, random generation and parameter estimation for the type I and III discrete Weibull distributions.
494 Probability Distributions distcrete Discrete Distribution Approximations Creates discretised versions of continuous distribution functions by mapping continuous values to an underlying discrete grid, based on a (uniform) frequency of discretisation, a valid discretisation point, and an integration range. For a review of discretisation methods, see Chakraborty (2015) <doi:10.1186/s40488-015-0028-6>.
495 Probability Distributions distr (core) Object Oriented Implementation of Distributions S4-classes and methods for distributions.
496 Probability Distributions distrDoc Documentation for ‘distr’ Family of R Packages Provides documentation in form of a common vignette to packages ‘distr’, ‘distrEx’, ‘distrMod’, ‘distrSim’, ‘distrTEst’, ‘distrTeach’, and ‘distrEllipse’.
497 Probability Distributions distrEllipse S4 Classes for Elliptically Contoured Distributions Distribution (S4-)classes for elliptically contoured distributions (based on package ‘distr’).
498 Probability Distributions distrEx Extensions of Package ‘distr’ Extends package ‘distr’ by functionals, distances, and conditional distributions.
499 Probability Distributions DistributionUtils Distribution Utilities Utilities are provided which are of use in the packages I have developed for dealing with distributions. Currently these packages are GeneralizedHyperbolic, VarianceGamma, and SkewHyperbolic and NormalLaplace. Each of these packages requires DistributionUtils. Functionality includes sample skewness and kurtosis, log-histogram, tail plots, moments by integration, changing the point about which a moment is calculated, functions for testing distributions using inversion tests and the Massart inequality. Also includes an implementation of the incomplete Bessel K function.
500 Probability Distributions distrMod Object Oriented Implementation of Probability Models Implements S4 classes for probability models based on packages ‘distr’ and ‘distrEx’.
501 Probability Distributions distrSim Simulation Classes Based on Package ‘distr’ S4-classes for setting up a coherent framework for simulation within the distr family of packages.
502 Probability Distributions distrTeach Extensions of Package ‘distr’ for Teaching Stochastics/Statistics in Secondary School Provides flexible examples of LLN and CLT for teaching purposes in secondary school.
503 Probability Distributions distrTEst Estimation and Testing Classes Based on Package ‘distr’ Evaluation (S4-)classes based on package distr for evaluating procedures (estimators/tests) at data/simulation in a unified way.
504 Probability Distributions distTails A Collection of Full Defined Distribution Tails A full definition for Weibull tails and Full-Tails Gamma and tools for fitting these distributions to empirical tails. This package build upon the paper by del Castillo, Joan & Daoudi, Jalila & Serra, Isabel. (2012) <doi:10.1017/asb.2017.9>.
505 Probability Distributions dng Distributions and Gradients Provides density, distribution function, quantile function and random generation for the split normal and split-t distributions, and computes their mean, variance, skewness and kurtosis for the two distributions (Li, F, Villani, M. and Kohn, R. (2010) <doi:10.1016/j.jspi.2010.04.031>).
506 Probability Distributions dqrng Fast Pseudo Random Number Generators Several fast random number generators are provided as C++ header only libraries: The PCG family by O’Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as Xoroshiro128+ and Xoshiro256+ by Blackman and Vigna (2018 <arXiv:1805.01407>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+ and Xoshiro256+ as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011 <doi:10.1145/2063384.2063405>) as provided by the package ‘sitmo’.
507 Probability Distributions e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
508 Probability Distributions ecd Elliptic Lambda Distribution and Option Pricing Model Elliptic lambda distribution and lambda option pricing model have been evolved into a framework of stable-law inspired distributions, such as the extended stable lambda distribution for asset return, stable count distribution for volatility, and Lihn-Laplace process as a leptokurtic extension of Wiener process. This package contains functions for the computation of density, probability, quantile, random variable, fitting procedures, option prices, volatility smile. It also comes with sample financial data, and plotting routines.
509 Probability Distributions Emcdf Computation and Visualization of Empirical Joint Distribution (Empirical Joint CDF) Computes and visualizes empirical joint distribution of multivariate data with optimized algorithms and multi-thread computation. There is a faster algorithm using dynamic programming to compute the whole empirical joint distribution of a bivariate data. There are optimized algorithms for computing empirical joint CDF function values for other multivariate data. Visualization is focused on bivariate data. Levelplots and wireframes are included.
510 Probability Distributions emdbook Support Functions and Data for “Ecological Models and Data” Auxiliary functions and data sets for “Ecological Models and Data”, a book presenting maximum likelihood estimation and related topics for ecologists (ISBN 978-0-691-12522-0).
511 Probability Distributions emg Exponentially Modified Gaussian (EMG) Distribution Provides basic distribution functions for a mixture model of a Gaussian and exponential distribution.
512 Probability Distributions empichar Evaluates the Empirical Characteristic Function for Multivariate Samples Evaluates the empirical characteristic function of univariate and multivariate samples. This package uses ‘RcppArmadillo’ for fast evaluation. It is also possible to export the code to be used in other packages at ‘C++’ level.
513 Probability Distributions EnvStats Package for Environmental Statistics, Including US EPA Guidance Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <http://www.springer.com/book/9781461484554>).
514 Probability Distributions evd Functions for Extreme Value Distributions Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.
515 Probability Distributions evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
516 Probability Distributions evir Extreme Values in R Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.
517 Probability Distributions evmix Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation The usual distribution functions, maximum likelihood inference and model diagnostics for univariate stationary extreme value mixture models are provided. Kernel density estimation including various boundary corrected kernel density estimation methods and a wide choice of kernels, with cross-validation likelihood based bandwidth estimator. Reasonable consistency with the base functions in the ‘evd’ package is provided, so that users can safely interchange most code.
518 Probability Distributions extraDistr Additional Univariate and Multivariate Distributions Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson.
519 Probability Distributions extremefit Estimation of Extreme Conditional Quantiles and Probabilities Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.
520 Probability Distributions FAdist Distributions that are Sometimes Used in Hydrology Probability distributions that are sometimes useful in hydrology.
521 Probability Distributions FatTailsR Kiener Distributions and Fat Tails in Finance Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, value-at-risk and expected shortfall. Include power hyperbolas and power hyperbolic functions.
522 Probability Distributions fBasics Rmetrics - Markets and Basic Statistics Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.
523 Probability Distributions fCopulae (core) Rmetrics - Bivariate Dependence Structures with Copulae Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.
524 Probability Distributions fExtremes Rmetrics - Modelling Extreme Events in Finance Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data pre-processing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.
525 Probability Distributions fgac Generalized Archimedean Copula Bi-variate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).
526 Probability Distributions fitdistrplus Help to Fit of a Parametric Distribution to Non-Censored or Censored Data Extends the fitdistr() function (of the MASS package) with several functions to help the fit of a parametric distribution to non-censored or censored data. Censored data may contain left censored, right censored and interval censored values, with several lower and upper bounds. In addition to maximum likelihood estimation (MLE), the package provides moment matching (MME), quantile matching (QME) and maximum goodness-of-fit estimation (MGE) methods (available only for non-censored data). Weighted versions of MLE, MME and QME are available. See e.g. Casella & Berger (2002). Statistical inference. Pacific Grove.
527 Probability Distributions fitteR Fit Hundreds of Theoretical Distributions to Empirical Data Systematic fit of hundreds of theoretical univariate distributions to empirical data via maximum likelihood estimation. Fits are reported and summarized by a data.frame, a csv file or a ‘shiny’ app (here with additional features like visual representation of fits). All output formats provide assessment of goodness-of-fit by the following methods: Kolmogorov-Smirnov test, Shapiro-Wilks test, Anderson-Darling test.
528 Probability Distributions flexsurv Flexible Parametric Survival and Multi-State Models Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models.
529 Probability Distributions FMStable Finite Moment Stable Distributions This package implements some basic procedures for dealing with log maximally skew stable distributions, which are also called finite moment log stable distributions.
530 Probability Distributions fpow Computing the noncentrality parameter of the noncentral F distribution Returns the noncentrality parameter of the noncentral F distribution if probability of type I and type II error, degrees of freedom of the numerator and the denominator are given. It may be useful for computing minimal detectable differences for general ANOVA models. This program is documented in the paper of A. Baharev, S. Kemeny, On the computation of the noncentral F and noncentral beta distribution; Statistics and Computing, 2008, 18 (3), 333-340.
531 Probability Distributions frmqa The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.
532 Probability Distributions fromo Fast Robust Moments Fast, numerically robust computation of weighted moments via ‘Rcpp’. Supports computation on vectors and matrices, and Monoidal append of moments. Moments and cumulants over running fixed length windows can be computed, as well as over time-based windows. Moment computations are via a generalization of Welford’s method, as described by Bennett et. (2009) <doi:10.1109/CLUSTR.2009.5289161>.
533 Probability Distributions gambin Fit the Gambin Model to Species Abundance Distributions Fits unimodal and multimodal gambin distributions to species-abundance distributions from ecological data, as in in Matthews et al. (2014) <doi:10.1111/ecog.00861>. ‘gambin’ is short for ‘gamma-binomial’. The main function is fit_abundances(), which estimates the ‘alpha’ parameter(s) of the gambin distribution using maximum likelihood. Functions are also provided to generate the gambin distribution and for calculating likelihood statistics.
534 Probability Distributions gamlss.dist (core) Distributions for Generalized Additive Models for Location Scale and Shape A set of distributions which can be used for modelling the response variables in Generalized Additive Models for Location Scale and Shape, Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The distributions can be continuous, discrete or mixed distributions. Extra distributions can be created, by transforming, any continuous distribution defined on the real line, to a distribution defined on ranges 0 to infinity or 0 to 1, by using a ”log” or a ”logit’ transformation respectively.
535 Probability Distributions gamlss.mx Fitting Mixture Distributions with GAMLSS The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.
536 Probability Distributions gaussDiff Difference measures for multivariate Gaussian probability density functions A collection difference measures for multivariate Gaussian probability density functions, such as the Euclidea mean, the Mahalanobis distance, the Kullback-Leibler divergence, the J-Coefficient, the Minkowski L2-distance, the Chi-square divergence and the Hellinger Coefficient.
537 Probability Distributions gb Generalize Lambda Distribution and Generalized Bootstrapping A collection of algorithms and functions for fitting data to a generalized lambda distribution via moment matching methods, and generalized bootstrapping.
538 Probability Distributions GB2 Generalized Beta Distribution of the Second Kind: Properties, Likelihood, Estimation Package GB2 explores the Generalized Beta distribution of the second kind. Density, cumulative distribution function, quantiles and moments of the distributions are given. Functions for the full log-likelihood, the profile log-likelihood and the scores are provided. Formulas for various indicators of inequality and poverty under the GB2 are implemented. The GB2 is fitted by the methods of maximum pseudo-likelihood estimation using the full and profile log-likelihood, and non-linear least squares estimation of the model parameters. Various plots for the visualization and analysis of the results are provided. Variance estimation of the parameters is provided for the method of maximum pseudo-likelihood estimation. A mixture distribution based on the compounding property of the GB2 is presented (denoted as “compound” in the documentation). This mixture distribution is based on the discretization of the distribution of the underlying random scale parameter. The discretization can be left or right tail. Density, cumulative distribution function, moments and quantiles for the mixture distribution are provided. The compound mixture distribution is fitted using the method of maximum pseudo-likelihood estimation. The fit can also incorporate the use of auxiliary information. In this new version of the package, the mixture case is complemented with new functions for variance estimation by linearization and comparative density plots.
539 Probability Distributions GenBinomApps Clopper-Pearson Confidence Interval and Generalized Binomial Distribution Density, distribution function, quantile function and random generation for the Generalized Binomial Distribution. Functions to compute the Clopper-Pearson Confidence Interval and the required sample size. Enhanced model for burn-in studies, where failures are tackled by countermeasures.
540 Probability Distributions gendist Generated Probability Distribution Models Computes the probability density function (pdf), cumulative distribution function (cdf), quantile function (qf) and generates random values (rg) for the following general models : mixture models, composite models, folded models, skewed symmetric models and arc tan models.
541 Probability Distributions GeneralizedHyperbolic The Generalized Hyperbolic Distribution Functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skew-Laplace distribution. Additional functionality is provided for the hyperbolic distribution, normal inverse Gaussian distribution and generalized inverse Gaussian distribution, including fitting of these distributions to data. Linear models with hyperbolic errors may be fitted using hyperblmFit.
542 Probability Distributions GenOrd Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions A gaussian copula based procedure for generating samples from discrete random variables with prescribed correlation matrix and marginal distributions.
543 Probability Distributions geoR Analysis of Geostatistical Data Geostatistical analysis including traditional, likelihood-based and Bayesian methods.
544 Probability Distributions ghyp A Package on Generalized Hyperbolic Distribution and Its Special Cases Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Student-t and Gaussian distribution). Especially, it contains fitting procedures, an AIC-based model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.
545 Probability Distributions GIGrvg Random Variate Generator for the GIG Distribution Generator and density function for the Generalized Inverse Gaussian (GIG) distribution.
546 Probability Distributions gk g-and-k and g-and-h Distribution Functions Functions for the g-and-k and generalised g-and-h distributions.
547 Probability Distributions gld Estimation and Use of the Generalised (Tukey) Lambda Distribution The generalised lambda distribution, or Tukey lambda distribution, provides a wide variety of shapes with one functional form. This package provides random numbers, quantiles, probabilities, densities and density quantiles for four different types of the distribution, the FKML, RS, GPD and FM5 - see documentation for details. It provides the density function, distribution function, and Quantile-Quantile plots. It implements a variety of estimation methods for the distribution, including diagnostic plots. Estimation methods include the starship (all 4 types), method of L-Moments for the GPD and FKML types, and a number of methods for only the FKML parameterisation. These include maximum likelihood, maximum product of spacings, Titterington’s method, Moments, Trimmed L-Moments and Distributional Least Absolutes.
548 Probability Distributions GLDEX Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.
549 Probability Distributions glogis Fitting and Testing Generalized Logistic Distributions Tools for the generalized logistic distribution (Type I, also known as skew-logistic distribution), encompassing basic distribution functions (p, q, d, r, score), maximum likelihood estimation, and structural change methods.
550 Probability Distributions greybox Toolbox for Model Building and Forecasting Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
551 Probability Distributions GSM Gamma Shape Mixture Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavy-tailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.
552 Probability Distributions gumbel The Gumbel-Hougaard Copula Provides probability functions (cumulative distribution and density functions), simulation function (Gumbel copula multivariate simulation) and estimation functions (Maximum Likelihood Estimation, Inference For Margins, Moment Based Estimation and Canonical Maximum Likelihood).
553 Probability Distributions HAC Estimation, Simulation and Visualization of Hierarchical Archimedean Copulae (HAC) Package provides the estimation of the structure and the parameters, sampling methods and structural plots of Hierarchical Archimedean Copulae (HAC).
554 Probability Distributions hermite Generalized Hermite Distribution Probability functions and other utilities for the generalized Hermite distribution.
555 Probability Distributions HI Simulation from distributions supported by nested hyperplanes Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.
556 Probability Distributions HistogramTools Utility Functions for R Histograms Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.
557 Probability Distributions hyper2 The Hyperdirichlet Distribution, Mark 2 A suite of routines for the hyperdirichlet distribution; supersedes the ‘hyperdirichlet’ package.
558 Probability Distributions HyperbolicDist The hyperbolic distribution This package provides functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skew-Laplace distribution. Additional functionality is provided for the hyperbolic distribution, including fitting of the hyperbolic to data.
559 Probability Distributions ihs Inverse Hyperbolic Sine Distribution Density, distribution function, quantile function and random generation for the inverse hyperbolic sine distribution. This package also provides a function that can fit data to the inverse hyperbolic sine distribution using maximum likelihood estimation.
560 Probability Distributions kdist K-Distribution and Weibull Paper Density, distribution function, quantile function and random generation for the K-distribution. A plotting function that plots data on Weibull paper and another function to draw additional lines. See results from package in T Lamont-Smith (2018), submitted J. R. Stat. Soc.
561 Probability Distributions kernelboot Smoothed Bootstrap and Random Generation from Kernel Densities Smoothed bootstrap and functions for random generation from univariate and multivariate kernel densities. It does not estimate kernel densities.
562 Probability Distributions kolmim An Improved Evaluation of Kolmogorov’s Distribution Provides an alternative, more efficient evaluation of extreme probabilities of Kolmogorov’s goodness-of-fit measure, Dn, when compared to the original implementation of Wang, Marsaglia, and Tsang. These probabilities are used in Kolmogorov-Smirnov tests when comparing two samples.
563 Probability Distributions KScorrect Lilliefors-Corrected Kolmogorov-Smirnov Goodness-of-Fit Tests Implements the Lilliefors-corrected Kolmogorov-Smirnov test for use in goodness-of-fit tests, suitable when population parameters are unknown and must be estimated by sample statistics. P-values are estimated by simulation. Can be used with a variety of continuous distributions, including normal, lognormal, univariate mixtures of normals, uniform, loguniform, exponential, gamma, and Weibull distributions. Functions to generate random numbers and calculate density, distribution, and quantile functions are provided for use with the log uniform and mixture distributions.
564 Probability Distributions LambertW Probabilistic Models to Analyze and Gaussianize Heavy-Tailed, Skewed Data Lambert W x F distributions are a generalized framework to analyze skewed, heavy-tailed data. It is based on an input/output system, where the output random variable (RV) Y is a non-linearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavy-tailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavy-tailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. Probably the most important function is ‘Gaussianize’, which works similarly to ‘scale’, but actually makes the data Gaussian. A do-it-yourself toolkit allows users to define their own Lambert W x ‘MyFavoriteDistribution’ and use it in their analysis right away.
565 Probability Distributions LaplacesDemon Complete Environment for Bayesian Inference Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.
566 Probability Distributions lcopula Liouville Copulas Collections of functions allowing random number generations and estimation of ‘Liouville’ copulas, as described in Belzile and Neslehova (2017) <doi:10.1016/j.jmva.2017.05.008>.
567 Probability Distributions LearnBayes Functions for Learning Bayesian Inference A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.
568 Probability Distributions lhs Latin Hypercube Samples Provides a number of methods for creating and augmenting Latin Hypercube Samples.
569 Probability Distributions LIHNPSD Poisson Subordinated Distribution A Poisson Subordinated Distribution to capture major leptokurtic features in log-return time series of financial data.
570 Probability Distributions LindleyPowerSeries Lindley Power Series Distribution Computes the probability density function, the cumulative distribution function, the hazard rate function, the quantile function and random generation for Lindley Power Series distributions, see Nadarajah and Si (2018) <doi:10.1007/s13171-018-0150-x>.
571 Probability Distributions llogistic The L-Logistic Distribution Density, distribution function, quantile function and random generation for the L-Logistic distribution with parameters m and phi. The parameter m is the median of the distribution.
572 Probability Distributions lmom L-Moments Functions related to L-moments: computation of L-moments and trimmed L-moments of distributions and data samples; parameter estimation; L-moment ratio diagram; plot vs. quantiles of an extreme-value distribution.
573 Probability Distributions lmomco (core) L-Moments, Censored L-Moments, Trimmed L-Moments, L-Comoments, and Many Distributions Extensive functions for L-moments (LMs) and probability-weighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and L-moment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for right-tail and left-tail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TL-moments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances- covariances of LMs are provided. The Harri-Coble Tau34-squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for right-tail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], Eta-Mu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], Kappa-Mu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3-p log-Normal [L], Pearson Type III [L], Rayleigh [L], Rev-Gumbel [L+RC], Rice/Rician [L], Slash [TL], 3-p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample L-comoments (LCMs) are implemented to measure asymmetric associations.
574 Probability Distributions Lmoments L-Moments and Quantile Mixtures Contains functions to estimate L-moments and trimmed L-moments from the data. Also contains functions to estimate the parameters of the normal polynomial quantile mixture and the Cauchy polynomial quantile mixture from L-moments and trimmed L-moments.
575 Probability Distributions logitnorm Functions for the Logitnormal Distribution Density, distribution, quantile and random generation function for the logitnormal distribution. Estimation of the mode and the first two moments. Estimation of distribution parameters.
576 Probability Distributions loglognorm Double log normal distribution functions r,d,p,q functions for the double log normal distribution
577 Probability Distributions marg Approximate Marginal Inference for Regression-Scale Models Likelihood inference based on higher order approximations for linear nonnormal regression models.
578 Probability Distributions MASS Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
579 Probability Distributions matrixNormal The Matrix Normal Distribution Computes densities, probabilities, and random deviates of the Matrix Normal (Iranmanesh et al. (2010) <doi:10.7508/ijmsi.2010.02.004>). Also includes simple but useful matrix functions. See the vignette for more information.
580 Probability Distributions matrixsampling Simulations of Matrix Variate Distributions Provides samplers for various matrix variate distributions: Wishart, inverse-Wishart, normal, t, inverted-t, Beta type I, Beta type II, Gamma, confluent hypergeometric. Allows to simulate the noncentral Wishart distribution without the integer restriction on the degrees of freedom.
581 Probability Distributions mbbefd Maxwell Boltzmann Bose Einstein Fermi Dirac Distribution and Destruction Rate Modelling Distributions that are typically used for exposure rating in general insurance, in particular to price reinsurance contracts. The vignettes show code snippets to fit the distribution to empirical data.
582 Probability Distributions MBSP Multivariate Bayesian Model with Shrinkage Priors Implements a sparse Bayesian multivariate linear regression model using shrinkage priors from the three parameter beta normal family. The method is described in Bai and Ghosh (2018) <arXiv:1711.07635>.
583 Probability Distributions mc2d Tools for Two-Dimensional Monte-Carlo Simulations A complete framework to build and study Two-Dimensional Monte-Carlo simulations, aka Second-Order Monte-Carlo simulations. Also includes various distributions (pert, triangular, Bernoulli, empirical discrete and continuous).
584 Probability Distributions mclust Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
585 Probability Distributions MCMCpack Markov Chain Monte Carlo (MCMC) Package Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
586 Probability Distributions mded Measuring the Difference Between Two Empirical Distributions Provides a function for measuring the difference between two independent or non-independent empirical distributions and returning a significance level of the difference.
587 Probability Distributions MEPDF Creation of Empirical Density Functions Based on Multivariate Data Based on the input data an n-dimensional cube with sub cells of user specified side length is created. The number of sample points which fall in each sub cube is counted, and with the cell volume and overall sample size an empirical probability can be computed. A number of cubes of higher resolution can be superimposed. The basic method stems from J.L. Bentley in “Multidimensional Divide and Conquer”. J. L. Bentley (1980) <doi:10.1145/358841.358850>. Furthermore a simple kernel density estimation method is made available, as well as an expansion of Bentleys method, which offers a kernel approach for the grid method.
588 Probability Distributions mgpd mgpd: Functions for multivariate generalized Pareto distribution (MGPD of Type II) Extends distribution and density functions to parametric multivariate generalized Pareto distributions (MGPD of Type II), and provides fitting functions which calculate maximum likelihood estimates for bivariate and trivariate models. (Help is under progress)
589 Probability Distributions minimax Minimax distribution family The minimax family of distributions is a two-parameter family like the beta family, but computationally a lot more tractible.
590 Probability Distributions MitISEM Mixture of Student t Distributions using Importance Sampling and Expectation Maximization Flexible multivariate function approximation using adapted Mixture of Student t Distributions. Mixture of t distribution is obtained using Importance Sampling weighted Expectation Maximization algorithm.
591 Probability Distributions MittagLeffleR Mittag-Leffler Family of Distributions Implements the Mittag-Leffler function, distribution, random variate generation, and estimation. Based on the Laplace-Inversion algorithm by Garrappa, R. (2015) <doi:10.1137/140971191>.
592 Probability Distributions MixedTS Mixed Tempered Stable Distribution We provide detailed functions for univariate Mixed Tempered Stable distribution.
593 Probability Distributions mixtools Tools for Analyzing Finite Mixture Models Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
594 Probability Distributions MM The Multiplicative Multinomial Distribution Various utilities for the Multiplicative Multinomial distribution.
595 Probability Distributions mnormpow Multivariate Normal Distributions with Power Integrand Computes integral of f(x)*x_i^k on a product of intervals, where f is the density of a gaussian law. This a is small alteration of the mnormt code from A. Genz and A. Azzalini.
596 Probability Distributions mnormt (core) The Multivariate Normal and t Distributions Functions are provided for computing the density and the distribution function of multivariate normal and “t” random variables, and for generating random vectors sampled from these distributions. Probabilities are computed via non-Monte Carlo methods; different routines are used in the case d=1, d=2, d>2, if d denotes the number of dimensions.
597 Probability Distributions modeest Mode Estimation Provides estimators of the mode of univariate data or univariate distributions.
598 Probability Distributions moments Moments, cumulants, skewness, kurtosis and related tests Functions to calculate: moments, Pearson’s kurtosis, Geary’s kurtosis and skewness; tests related to them (Anscombe-Glynn, D’Agostino, Bonett-Seier).
599 Probability Distributions movMF Mixtures of von Mises-Fisher Distributions Fit and simulate mixtures of von Mises-Fisher distributions.
600 Probability Distributions msm Multi-State Markov and Hidden Markov Models in Continuous Time Functions for fitting continuous-time Markov and hidden Markov multi-state models to longitudinal data. Designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewise-constant in time.
601 Probability Distributions MultiRNG Multivariate Pseudo-Random Number Generation Pseudo-random number generation for 11 multivariate distributions: Normal, t, Uniform, Bernoulli, Hypergeometric, Beta (Dirichlet), Multinomial, Dirichlet-Multinomial, Laplace, Wishart, and Inverted Wishart. The details of the method are explained in Demirtas (2004) <doi:10.22237/jmasm/1099268340>.
602 Probability Distributions mvprpb Orthant Probability of the Multivariate Normal Distribution Computes orthant probabilities multivariate normal distribution.
603 Probability Distributions mvrtn Mean and Variance of Truncated Normal Distribution Mean, variance, and random variates for left/right truncated normal distributions.
604 Probability Distributions mvtnorm (core) Multivariate Normal and t Distributions Computes multivariate normal and t probabilities, quantiles, random deviates and densities.
605 Probability Distributions nCDunnett Noncentral Dunnett’s Test Distribution Computes the noncentral Dunnett’s test distribution (pdf, cdf and quantile) and generates random numbers.
606 Probability Distributions nCopula Hierarchical Archimedean Copulas Constructed with Multivariate Compound Distributions Construct and manipulate hierarchical Archimedean copulas with multivariate compound distributions. The model used is the one of Cossette et al. (2017) <doi:10.1016/j.insmatheco.2017.06.001>.
607 Probability Distributions Newdistns Computes Pdf, Cdf, Quantile and Random Numbers, Measures of Inference for 19 General Families of Distributions Computes the probability density function, cumulative distribution function, quantile function, random numbers and measures of inference for the following general families of distributions (each family defined in terms of an arbitrary cdf G): Marshall Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncated-exponential skew-symmetric G distributions, modified beta G distributions, and exponentiated exponential Poisson G distributions.
608 Probability Distributions nor1mix Normal aka Gaussian (1-d) Mixture Models (S3 Classes and Methods) Onedimensional Normal (i.e. Gaussian) Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used Marron-Wand densities. Efficient random number generation and graphics. Fitting to data by efficient ML (Maximum Likelihood) or traditional EM estimation.
609 Probability Distributions NormalGamma Normal-gamma convolution model The functions proposed in this package compute the density of the sum of a Gaussian and a gamma random variables, estimate the parameters and correct the noise effect in a gamma-signal and Gaussian-noise model. This package has been used to implement the background correction method for Illumina microarray data presented in Plancade S., Rozenholc Y. and Lund E. “Generalization of the normal-exponential model : exploration of a more accurate parameterization for the signal distribution on Illumina BeadArrays”, BMC Bioinfo 2012, 13(329).
610 Probability Distributions NormalLaplace The Normal Laplace Distribution Functions for the normal Laplace distribution. The package is under development and provides only limited functionality. Density, distribution and quantile functions, random number generation, and moments are provided.
611 Probability Distributions normalp Routines for Exponential Power Distribution Collection of utilities referred to Exponential Power distribution, also known as General Error Distribution (see Mineo, A.M. and Ruggieri, M. (2005), A software Tool for the Exponential Power Distribution: The normalp package. In Journal of Statistical Software, Vol. 12, Issue 4).
612 Probability Distributions ORDER2PARENT Estimate parent distributions with data of several order statistics This package uses B-spline based nonparametric smooth estimators to estimate parent distributions given observations on multiple order statistics.
613 Probability Distributions OrdNor Concurrent Generation of Ordinal and Normal Data with Given Correlation Matrix and Marginal Distributions Implementation of a procedure for generating samples from a mixed distribution of ordinal and normal random variables with pre-specified correlation matrix and marginal distributions.
614 Probability Distributions ParetoPosStable Computing, Fitting and Validating the PPS Distribution Statistical functions to describe a Pareto Positive Stable (PPS) distribution and fit it to real data. Graphical and statistical tools to validate the fits are included.
615 Probability Distributions pbv Probabilities for Bivariate Normal Distribution Computes probabilities of the bivariate normal distribution in a vectorized R function (Drezner & Wesolowsky, 1990, <doi:10.1080/00949659008811236>).
616 Probability Distributions PDQutils PDQ Functions via Gram Charlier, Edgeworth, and Cornish Fisher Approximations A collection of tools for approximating the ‘PDQ’ functions (respectively, the cumulative distribution, density, and quantile) of probability distributions via classical expansions involving moments and cumulants.
617 Probability Distributions PearsonDS (core) Pearson Distribution System Implementation of the Pearson distribution system, including full support for the (d,p,q,r)-family of functions for probability distributions and fitting via method of moments and maximum likelihood method.
618 Probability Distributions PhaseType Inference for Phase-type Distributions Functions to perform Bayesian inference on absorption time data for Phase-type distributions. Plans to expand this to include frequentist inference and simulation tools.
619 Probability Distributions pmultinom One-Sided Multinomial Probabilities Implements multinomial CDF (P(N1<=n1, …, Nk<=nk)) and tail probabilities (P(N1>n1, …, Nk>nk)), as well as probabilities with both constraints (P(l1<N1<=u1, …, lk<Nk<=uk)). Uses a method suggested by Bruce Levin (1981) <doi:10.1214/aos/1176345593>.
620 Probability Distributions poibin The Poisson Binomial Distribution Implementation of both the exact and approximation methods for computing the cdf of the Poisson binomial distribution. It also provides the pmf, quantile function, and random number generation for the Poisson binomial distribution.
621 Probability Distributions poilog Poisson lognormal and bivariate Poisson lognormal distribution Functions for obtaining the density, random deviates and maximum likelihood estimates of the Poisson lognormal distribution and the bivariate Poisson lognormal distribution.
622 Probability Distributions poistweedie Poisson-Tweedie exponential family models Simulation of models Poisson-Tweedie.
623 Probability Distributions polyaAeppli Implementation of the Polya-Aeppli distribution Functions for evaluating the mass density, cumulative distribution function, quantile function and random variate generation for the Polya-Aeppli distribution, also known as the geometric compound Poisson distribution.
624 Probability Distributions poweRlaw Analysis of Heavy Tailed Distributions An implementation of maximum likelihood estimators for a variety of heavy tailed distributions, including both the discrete and continuous power law distributions. Additionally, a goodness-of-fit based approach is used to estimate the lower cut-off for the scaling region.
625 Probability Distributions probhat Generalized Kernel Smoothing Computes nonparametric probability distributions (probability density functions, cumulative distribution functions and quantile functions) using kernel smoothing. Supports univariate, multivariate and conditional distributions, and weighted data (possibly useful mixed with fuzzy clustering or frequency data). Also, supports empirical continuous cumulative distribution functions and their inverses, and random number generation.
626 Probability Distributions qmap Statistical Transformations for Post-Processing Climate Model Output Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.
627 Probability Distributions qqid Generation and Support of QQIDs - A Human-Compatible Representation of 128-bit Numbers The string “bird.carp.7TsBWtwqtKAeCTNk8f” is a “QQID”: a representation of a 128-bit number, constructed from two “cues” of short, common, English words, and Base64 encoded characters. The primary intended use of QQIDs is as random unique identifiers, e.g. database keys like the “UUIDs” defined in the RFC 4122 Internet standard. QQIDs can be identically interconverted with UUIDs, IPv6 addresses, MD5 hashes etc., and are suitable for a host of applications in which identifiers are read by humans. They are compact, can safely be transmitted in binary and text form, can be used as components of URLs, and it can be established at a glance whether two QQIDs are different or potentially identical. The qqid package contains functions to retrieve true, quantum-random QQIDs, to generate pseudo- random QQIDs, to validate them, and to interconvert them with other 128-bit number representations.
628 Probability Distributions qrandom True Random Numbers using the ANU Quantum Random Numbers Server The ANU Quantum Random Number Generator provided by the Australian National University generates true random numbers in real-time by measuring the quantum fluctuations of the vacuum. This package offers an interface using their API. The electromagnetic field of the vacuum exhibits random fluctuations in phase and amplitude at all frequencies. By carefully measuring these fluctuations, one is able to generate ultra-high bandwidth random numbers. The quantum Random Number Generator is based on the papers by Symul et al., (2011) <doi:10.1063/1.3597793> and Haw, et al. (2015) <doi:10.1103/PhysRevApplied.3.054004>. The package offers functions to retrieve a sequence of random integers or hexadecimals and true random samples from a normal or uniform distribution.
629 Probability Distributions QRM Provides R-Language Code to Examine Quantitative Risk Management Concepts Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.
630 Probability Distributions qrmtools Tools for Quantitative Risk Management Functions and data sets for reproducing selected results from the book “Quantitative Risk Management: Concepts, Techniques and Tools”. Furthermore, new developments and auxiliary functions for Quantitative Risk Management practice.
631 Probability Distributions qrng (Randomized) Quasi-Random Number Generators Functionality for generating (randomized) quasi-random numbers in high dimensions.
632 Probability Distributions randaes Random number generator based on AES cipher The deterministic part of the Fortuna cryptographic pseudorandom number generator, described by Schneier & Ferguson “Practical Cryptography”
633 Probability Distributions random True Random Numbers using RANDOM.ORG The true random number service provided by the RANDOM.ORG website created by Mads Haahr samples atmospheric noise via radio tuned to an unused broadcasting frequency together with a skew correction algorithm due to John von Neumann. More background is available in the included vignette based on an essay by Mads Haahr. In its current form, the package offers functions to retrieve random integers, randomized sequences and random strings.
634 Probability Distributions randtoolbox Toolbox for Pseudo and Quasi Random Number Generation and Random Generator Tests Provides (1) pseudo random generators - general linear congruential generators, multiple recursive generators and generalized feedback shift register (SF-Mersenne Twister algorithm and WELL generators); (2) quasi random generators - the Torus algorithm, the Sobol sequence, the Halton sequence (including the Van der Corput sequence) and (3) some generator tests - the gap test, the serial test, the poker test. See e.g. Gentle (2003) <doi:10.1007/b97336>. The package can be provided without the rngWELL dependency on demand. Take a look at the Distribution task view of types and tests of random number generators. Version in Memoriam of Diethelm and Barbara Wuertz.
635 Probability Distributions RDieHarder R Interface to the ‘DieHarder’ RNG Test Suite The ‘RDieHarder’ package provides an R interface to the ‘DieHarder’ suite of random number generators and tests that was developed by Robert G. Brown and David Bauer, extending earlier work by George Marsaglia and others. The ‘DieHarder’ library is included, but if a version is already installed it will be used instead.
636 Probability Distributions ReIns Functions from “Reinsurance: Actuarial and Statistical Aspects” Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels <http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470772689.html>.
637 Probability Distributions reliaR (core) Package for some probability distributions A collection of utilities for some reliability models/probability distributions.
638 Probability Distributions Renext Renewal Method for Extreme Values Extrapolation Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a Maximum-Likelihood framework.
639 Probability Distributions retimes Reaction Time Analysis Reaction time analysis by maximum likelihood
640 Probability Distributions revdbayes Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.r-project.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the K-gaps model of Suveges and Davison (2010) <doi:10.1214/09-AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.
641 Probability Distributions rlecuyer R Interface to RNG with Multiple Streams Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.
642 Probability Distributions RMKdiscrete Sundry Discrete Probability Distributions Sundry discrete probability distributions and helper functions.
643 Probability Distributions RMTstat Distributions, Statistics and Tests derived from Random Matrix Theory Functions for working with the Tracy-Widom laws and other distributions related to the eigenvalues of large Wishart matrices. The tables for computing the Tracy-Widom densities and distribution functions were computed by Momar Dieng’s MATLAB package “RMLab” (formerly available on his homepage at http://math.arizona.edu/~momar/research.htm ). This package is part of a collaboration between Iain Johnstone, Zongming Ma, Patrick Perry, and Morteza Shahram. It will soon be replaced by a package with more accuracy and built-in support for relevant statistical tests.
644 Probability Distributions rmutil Utilities for Nonlinear Regression and Repeated Measurements Models A toolkit of functions for nonlinear regression and repeated measurements not to be used by itself but called by other Lindsey packages such as ‘gnlm’, ‘stable’, ‘growth’, ‘repeated’, and ‘event’ (available at <http://www.commanster.eu/rcode.html>).
645 Probability Distributions rngwell19937 Random number generator WELL19937a with 53 or 32 bit output Long period linear random number generator WELL19937a by F. Panneton, P. L’Ecuyer and M. Matsumoto. The initialization algorithm allows to seed the generator with a numeric vector of an arbitrary length and uses MRG32k5a by P. L’Ecuyer to achieve good quality of the initialization. The output function may be set to provide numbers from the interval (0,1) with 53 (the default) or 32 random bits. WELL19937a is of similar type as Mersenne Twister and has the same period. WELL19937a is slightly slower than Mersenne Twister, but has better equidistribution and “bit-mixing” properties and faster recovery from states with prevailing zeros than Mersenne Twister. All WELL generators with orders 512, 1024, 19937 and 44497 can be found in randtoolbox package.
646 Probability Distributions rstream Streams of Random Numbers Unified object oriented interface for multiple independent streams of random numbers from different sources.
647 Probability Distributions RTDE Robust Tail Dependence Estimation Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and bias-corrected estimation of the coefficient of tail dependence’ and ‘Robust and bias-corrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS-6416).
648 Probability Distributions rtdists Response Time Distributions Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model (Ratcliff & McKoon, 2008, <doi:10.1162/neco.2008.12-06-420>) based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA; Brown & Heathcote, 2008, <doi:10.1016/j.cogpsych.2007.12.002>) with different distributions underlying the drift rate.
649 Probability Distributions Runuran R Interface to the ‘UNU.RAN’ Random Variate Generators Interface to the ‘UNU.RAN’ library for Universal Non-Uniform RANdom variate generators. Thus it allows to build non-uniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.
650 Probability Distributions rust Ratio-of-Uniforms Simulation with Transformation Uses the generalized ratio-of-uniforms (RU) method to simulate from univariate and (low-dimensional) multivariate continuous distributions. The user specifies the log-density, up to an additive constant. The RU algorithm is applied after relocation of mode of the density to zero, and the user can choose a tuning parameter r. For details see Wakefield, Gelfand and Smith (1991) <doi:10.1007/BF01889987>, Efficient generation of random variates via the ratio-of-uniforms method, Statistics and Computing (1991) 1, 129-133. A Box-Cox variable transformation can be used to make the input density suitable for the RU method and to improve efficiency. In the multivariate case rotation of axes can also be used to improve efficiency. From version 1.2.0 the ‘Rcpp’ package <https://cran.r-project.org/package=Rcpp> can be used to improve efficiency.
651 Probability Distributions s20x Functions for University of Auckland Course STATS 201/208 Data Analysis A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors.
652 Probability Distributions sadists Some Additional Distributions Provides the density, distribution, quantile and generation functions of some obscure probability distributions, including the doubly non- central t, F, Beta, and Eta distributions; the lambda-prime and K-prime; the upsilon distribution; the (weighted) sum of non-central chi-squares to a power; the (weighted) sum of log non-central chi-squares; the product of non-central chi-squares to powers; the product of doubly non-central F variables; the product of independent normals.
653 Probability Distributions SCI Standardized Climate Indices Such as SPI, SRI or SPEI Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).
654 Probability Distributions setRNG Set (Normal) Random Number Generator and Seed SetRNG provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.
655 Probability Distributions sfsmisc Utilities from ‘Seminar fuer Statistik’ ETH Zurich Useful utilities [‘goodies’] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990’s. For graphics, have pretty (Log-scale) axes, an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a ‘tachoPlot()’, pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides ’Sys.*()’ functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().
656 Probability Distributions sgt Skewed Generalized T Distribution Tree Density, distribution function, quantile function and random generation for the skewed generalized t distribution. This package also provides a function that can fit data to the skewed generalized t distribution using maximum likelihood estimation.
657 Probability Distributions skellam Densities and Sampling for the Skellam Distribution Functions for the Skellam distribution, including: density (pmf), cdf, quantiles and regression.
658 Probability Distributions SkewHyperbolic The Skew Hyperbolic Student t-Distribution Functions are provided for the density function, distribution function, quantiles and random number generation for the skew hyperbolic t-distribution. There are also functions that fit the distribution to data. There are functions for the mean, variance, skewness, kurtosis and mode of a given distribution and to calculate moments of any order about any centre. To assess goodness of fit, there are functions to generate a Q-Q plot, a P-P plot and a tail plot.
659 Probability Distributions skewt The Skewed Student-t Distribution Density, distribution function, quantile function and random generation for the skewed t distribution of Fernandez and Steel.
660 Probability Distributions sld Estimation and Use of the Quantile-Based Skew Logistic Distribution The skew logistic distribution is a quantile-defined generalisation of the logistic distribution (van Staden and King 2015). Provides random numbers, quantiles, probabilities, densities and density quantiles for the distribution. It provides Quantile-Quantile plots and method of L-Moments estimation (including asymptotic standard errors) for the distribution.
661 Probability Distributions smoothmest Smoothed M-estimators for 1-dimensional location Some M-estimators for 1-dimensional location (Bisquare, ML for the Cauchy distribution, and the estimators from application of the smoothing principle introduced in Hampel, Hennig and Ronchetti (2011) to the above, the Huber M-estimator, and the median, main function is smoothm), and Pitman estimator.
662 Probability Distributions SMR Externally Studentized Midrange Distribution Computes the studentized midrange distribution (pdf, cdf and quantile) and generates random numbers
663 Probability Distributions sn The Skew-Normal and Related Distributions Such as the Skew-t Build and manipulate probability distributions of the skew-normal family and some related ones, notably the skew-t family, and provide related statistical methods for data fitting and model diagnostics, in the univariate and the multivariate case.
664 Probability Distributions sparseMVN Multivariate Normal Functions for Sparse Covariance and Precision Matrices Computes multivariate normal (MVN) densities, and samples from MVN distributions, when the covariance or precision matrix is sparse.
665 Probability Distributions spd Semi Parametric Distribution The Semi Parametric Piecewise Distribution blends the Generalized Pareto Distribution for the tails with a kernel based interior.
666 Probability Distributions stabledist Stable Distribution Functions Density, Probability and Quantile functions, and random number generation for (skew) stable distributions, using the parametrizations of Nolan.
667 Probability Distributions STAR Spike Train Analysis with R Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.
668 Probability Distributions statmod Statistical Modeling A collection of algorithms and functions to aid statistical modeling. Includes limiting dilution analysis (aka ELDA), growth curve comparisons, mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Also includes a number of advanced generalized linear model functions including new Tweedie and Digamma glm families and a secure convergence algorithm.
669 Probability Distributions SuppDists Supplementary Distributions Ten distributions supplementing those built into R. Inverse Gauss, Kruskal-Wallis, Kendall’s Tau, Friedman’s chi squared, Spearman’s rho, maximum F ratio, the Pearson product moment correlation coefficient, Johnson distributions, normal scores and generalized hypergeometric distributions. In addition two random number generators of George Marsaglia are included.
670 Probability Distributions symmoments Symbolic central and noncentral moments of the multivariate normal distribution Symbolic central and non-central moments of the multivariate normal distribution. Computes a standard representation, LateX code, and values at specified mean and covariance matrices.
671 Probability Distributions TLMoments Calculate TL-Moments and Convert Them to Distribution Parameters Calculates empirical TL-moments (trimmed L-moments) of arbitrary order and trimming, and converts them to distribution parameters.
672 Probability Distributions tmvmixnorm Sampling from Truncated Multivariate Normal and t Distributions Efficient sampling of truncated multivariate (scale) mixtures of normals under linear inequality constraints is nontrivial due to the analytically intractable normalizing constant. Meanwhile, traditional methods may subject to numerical issues, especially when the dimension is high and dependence is strong. Algorithms proposed by Li and Ghosh (2015) <doi:10.1080/15598608.2014.996690> are adopted for overcoming difficulties in simulating truncated distributions. Efficient rejection sampling for simulating truncated univariate normal distribution is included in the package, which shows superiority in terms of acceptance rate and numerical stability compared to existing methods and R packages. An efficient function for sampling from truncated multivariate normal distribution subject to convex polytope restriction regions based on Gibbs sampler for conditional truncated univariate distribution is provided. By extending the sampling method, a function for sampling truncated multivariate Student’s t distribution is also developed. Moreover, the proposed method and computation remain valid for high dimensional and strong dependence scenarios. Empirical results in Li and Ghosh (2015) <doi:10.1080/15598608.2014.996690> illustrated the superior performance in terms of various criteria (e.g. mixing and integrated auto-correlation time).
673 Probability Distributions tmvtnorm Truncated Multivariate Normal and Student t Distribution Random number generation for the truncated multivariate normal and Student t distribution. Computes probabilities, quantiles and densities, including one-dimensional and bivariate marginal densities. Computes first and second moments (i.e. mean and covariance matrix) for the double-truncated multinormal case.
674 Probability Distributions tolerance Statistical Tolerance Intervals and Regions Statistical tolerance limits provide the limits between which we can expect to find a specified proportion of a sampled population with a given level of confidence. This package provides functions for estimating tolerance limits (intervals) for various univariate distributions (binomial, Cauchy, discrete Pareto, exponential, two-parameter exponential, extreme value, hypergeometric, Laplace, logistic, negative binomial, negative hypergeometric, normal, Pareto, Poisson-Lindley, Poisson, uniform, and Zipf-Mandelbrot), Bayesian normal tolerance limits, multivariate normal tolerance regions, nonparametric tolerance intervals, tolerance bands for regression settings (linear regression, nonlinear regression, nonparametric regression, and multivariate regression), and analysis of variance tolerance intervals. Visualizations are also available for most of these settings.
675 Probability Distributions trapezoid The Trapezoidal Distribution The trapezoid package provides dtrapezoid, ptrapezoid, qtrapezoid, and rtrapezoid functions for the trapezoidal distribution.
676 Probability Distributions triangle Provides the Standard Distribution Functions for the Triangle Distribution Provides the “r, q, p, and d” distribution functions for the triangle distribution.
677 Probability Distributions truncnorm Truncated Normal Distribution Density, probability, quantile and random number generation functions for the truncated normal distribution.
678 Probability Distributions TSA Time Series Analysis Contains R functions and datasets detailed in the book “Time Series Analysis with Applications in R (second edition)” by Jonathan Cryer and Kung-Sik Chan.
679 Probability Distributions tsallisqexp Tsallis q-Exp Distribution Tsallis distribution also known as the q-exponential family distribution. Provide distribution d, p, q, r functions, fitting and testing functions. Project initiated by Paul Higbie and based on Cosma Shalizi’s code.
680 Probability Distributions TTmoment Sampling and Calculating the First and Second Moments for the Doubly Truncated Multivariate t Distribution Computing the first two moments of the truncated multivariate t (TMVT) distribution under the double truncation. Appling the slice sampling algorithm to generate random variates from the TMVT distribution.
681 Probability Distributions tweedie Evaluation of Tweedie Exponential Family Models Maximum likelihood computations for Tweedie families, including the series expansion (Dunn and Smyth, 2005; <doi10.1007/s11222-005-4070-y>) and the Fourier inversion (Dunn and Smyth, 2008; <doi:10.1007/s11222-007-9039-6>), and related methods.
682 Probability Distributions UnivRNG Univariate Pseudo-Random Number Generation Pseudo-random number generation of 17 univariate distributions.
683 Probability Distributions VarianceGamma The Variance Gamma Distribution Provides functions for the variance gamma distribution. Density, distribution and quantile functions. Functions for random number generation and fitting of the variance gamma to data. Also, functions for computing moments of the variance gamma distribution of any order about any location. In addition, there are functions for checking the validity of parameters and to interchange different sets of parameterizations for the variance gamma distribution.
684 Probability Distributions VGAM (core) Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
685 Probability Distributions VineCopula Statistical Inference of Vine Copulas Provides tools for the statistical analysis of vine copula models. The package includes tools for parameter estimation, model selection, simulation, goodness-of-fit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.
686 Probability Distributions vines Multivariate Dependence Modeling with Vines Implementation of the vine graphical model for building high-dimensional probability distributions as a factorization of bivariate copulas and marginal density functions. This package provides S4 classes for vines (C-vines and D-vines) and methods for inference, goodness-of-fit tests, density/distribution function evaluation, and simulation.
687 Probability Distributions vistributions Visualize Probability Distributions Visualize and compute percentiles/probabilities of normal, t, f, chi square and binomial distributions.
688 Probability Distributions visualize Graph Probability Distributions with User Supplied Parameters and Statistics Graphs the pdf or pmf and highlights what area or probability is present in user defined locations. Visualize is able to provide lower tail, bounded, upper tail, and two tail calculations. Supports strict and equal to inequalities. Also provided on the graph is the mean and variance of the distribution.
689 Probability Distributions Wrapped Computes Pdf, Cdf, Quantile, Random Numbers and Provides Estimation for any Univariate Wrapped Distributions Computes the pdf, cdf, quantile, random numbers for any wrapped G distributions. Computes maximum likelihood estimates of the parameters, standard errors, 95 percent confidence intervals, value of Cramer-von Misses statistic, value of Anderson Darling statistic, value of Kolmogorov Smirnov test statistic and its \(p\)-value, value of Akaike Information Criterion, value of Consistent Akaike Information Criterion, value of Bayesian Information Criterion, value of Hannan-Quinn information criterion, minimum value of the negative log-likelihood function and convergence status when the wrapped distribution is fitted to some data.
690 Probability Distributions zipfextR Zipf Extended Distributions Implementation of four extensions of the Zipf distribution: the Marshall-Olkin Extended Zipf (MOEZipf) Perez-Casany, M., & Casellas, A. (2013) <arXiv:1304.4540>, the Zipf-Poisson Extreme (Zipf-PE), the Zipf-Poisson Stopped Sum (Zipf-PSS) and the Zipf-Polylog distributions. In log-log scale, the two first extensions allow for top-concavity and top-convexity while the third one only allows for top-concavity. All the extensions maintain the linearity associated with the Zipf model in the tail.
691 Probability Distributions zipfR Statistical Models for Word Frequency Distributions Statistical models and utilities for the analysis of word frequency distributions. The utilities include functions for loading, manipulating and visualizing word frequency data and vocabulary growth curves. The package also implements several statistical models for the distribution of word frequencies in a population. (The name of this package derives from the most famous word frequency distribution, Zipf’s law.)
692 Econometrics AER (core) Applied Econometrics with R Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York. ISBN 978-0-387-77316-2. (See the vignette “AER” for a package overview.)
693 Econometrics alpaca Fit GLM’s with High-Dimensional k-Way Fixed Effects Provides a routine to concentrate out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm proposed by Stammann (2018) <arXiv:1707.01815> and is restricted to glm’s that are based on maximum likelihood estimation and non-linear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides an analytical bias-correction for binary choice models (logit and probit) derived by Fernandez-Val and Weidner (2016) <doi:10.1016/j.jeconom.2015.12.014>.
694 Econometrics aod Analysis of Overdispersed Data Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).
695 Econometrics apollo Tools for Choice Model Estimation and Application The Choice Modelling Centre (CMC) at the University of Leeds has developed flexible code for the estimation and application of choice models in R. Users are able to write their own model functions or use a mix of already available ones. Random heterogeneity, both continuous and discrete and at the level of individuals and choices, can be incorporated for all models. There is support for both standalone models and hybrid model structures. Both classical and Bayesian estimation is available, and multiple discrete continuous models are covered in addition to discrete choice. Multi-threading processing is supported for estimation and a large number of pre and post-estimation routines, including for computing posterior (individual-level) distributions are available. For examples, a manual, and a support forum, visit www.ApolloChoiceModelling.com. For more information on choice models see Train, K. (2009) <isbn:978-0-521-74738-7> and Hess, S. & Daly, A.J. (2014) <isbn:978-1-781-00314-5> for an overview of the field.
696 Econometrics apt Asymmetric Price Transmission Asymmetric price transmission between two time series is assessed. Several functions are available for linear and nonlinear threshold cointegration, and furthermore, symmetric and asymmetric error correction model. A graphical user interface is also included for major functions included in the package, so users can also use these functions in a more intuitive way.
697 Econometrics bayesm Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
698 Econometrics betareg Beta Regression Beta regression for modeling beta-distributed dependent variables, e.g., rates and proportions. In addition to maximum likelihood regression (for both mean and precision of a beta-distributed response), bias-corrected and bias-reduced estimation as well as finite mixture models and recursive partitioning for beta regressions are provided.
699 Econometrics bife Binary Choice Models with Fixed Effects Estimates fixed effects binary choice models (logit and probit) with potentially many individual fixed effects and computes average partial effects. Incidental parameter bias can be reduced with an asymptotic bias-correction proposed by Fernandez-Val (2009) <doi:10.1016/j.jeconom.2009.02.007>.
700 Econometrics bimets Time Series and Econometric Modeling Time series analysis, (dis)aggregation and manipulation, e.g. time series extension, merge, projection, lag, lead, delta, moving and cumulative average and product, selection by index, date and year-period, conversion to daily, monthly, quarterly, (semi)annually. Simultaneous equation models definition, estimation, simulation and forecasting with coefficient restrictions, error autocorrelation, exogenization, add-factors, impact and interim multipliers analysis, conditional equation evaluation, endogenous targeting and model renormalization.
701 Econometrics BMA Bayesian Model Averaging Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).
702 Econometrics BMS Bayesian Model Averaging Library Bayesian model averaging for linear models with a wide choice of (customizable) priors. Built-in priors include coefficient priors (fixed, flexible and hyper-g priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Post-processing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.
703 Econometrics boot Bootstrap Functions (Originally by Angelo Canty for S) Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.
704 Econometrics bootstrap Functions for the Book “An Introduction to the Bootstrap” Software (bootstrap, cross-validation, jackknife) and data for the book “An Introduction to the Bootstrap” by B. Efron and R. Tibshirani, 1993, Chapman and Hall. This package is primarily provided for projects already based on it, and for support of the book. New projects should preferentially use the recommended package “boot”.
705 Econometrics brglm Bias Reduction in Binomial-Response Generalized Linear Models Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as ‘glm’. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.
706 Econometrics CADFtest A Package to Perform Covariate Augmented Dickey-Fuller Unit Root Tests Hansen’s (1995) Covariate-Augmented Dickey-Fuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The p-values of the test are computed using the procedure illustrated in Lupi (2009).
707 Econometrics car (core) Companion to Applied Regression Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.
708 Econometrics CDNmoney Components of Canadian Monetary and Credit Aggregates Components of Canadian Credit Aggregates and Monetary Aggregates with continuity adjustments.
709 Econometrics censReg Censored Regression (Tobit) Models Maximum Likelihood estimation of censored regression (Tobit) models with cross-sectional and panel data.
710 Econometrics clubSandwich Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the bias-reduced linearization estimator introduced by Bell and McCaffrey (2002) <http://www.statcan.gc.ca/pub/12-001-x/2002002/article/9058-eng.pdf> and developed further by Pustejovsky and Tipton (2017) <doi:10.1080/07350015.2016.1247004>. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple- contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple- contrast hypotheses use an approximation to Hotelling’s T-squared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg() (from package ‘AER’), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’).
711 Econometrics clusterSEs Calculate Cluster-Robust p-Values and Confidence Intervals Calculate p-values and confidence intervals using cluster-adjusted t-statistics (based on Ibragimov and Muller (2010) <doi:10.1198/jbes.2009.08046>, pairs cluster bootstrapped t-statistics, and wild cluster bootstrapped t-statistics (the latter two techniques based on Cameron, Gelbach, and Miller (2008) <doi:10.1162/rest.90.3.414>. Procedures are included for use with GLM, ivreg, plm (pooling or fixed effects), and mlogit models.
712 Econometrics crch Censored Regression with Conditional Heteroscedasticity Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by interval-censoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals.
713 Econometrics decompr Global-Value-Chain Decomposition Two global-value-chain decompositions are implemented. Firstly, the Wang-Wei-Zhu (Wang, Wei, and Zhu, 2013) algorithm splits bilateral gross exports into 16 value-added components. Secondly, the Leontief decomposition (default) derives the value added origin of exports by country and industry, which is also based on Wang, Wei, and Zhu (Wang, Z., S.-J. Wei, and K. Zhu. 2013. “Quantifying International Production Sharing at the Bilateral and Sector Levels.”).
714 Econometrics dlsem Distributed-Lag Linear Structural Equation Models Inference functionalities for distributed-lag linear structural equation models (DLSEMs). DLSEMs are Markovian structural causal models where each factor of the joint probability distribution is a distributed-lag linear regression (Magrini, 2018 <doi:10.2478/bile-2018-0012>; Magrini et al., 2019 <doi:10.1007/s11135-019-00855-z>). DLSEMs account for temporal delays in the dependence relationships among the variables and allow to perform dynamic causal inference by assessing causal effects at different time lags. Endpoint-constrained quadratic, quadratic decreasing and gamma lag shapes are available.
715 Econometrics dynlm Dynamic Linear Regression Dynamic linear models and time series regression.
716 Econometrics Ecdat Data Sets for Econometrics Data sets for econometrics, including political science.
717 Econometrics effects Effect Displays for Linear, Generalized Linear, and Other Models Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.
718 Econometrics erer Empirical Research in Economics with R Functions, datasets, and sample codes related to the book of ‘Empirical Research in Economics: Growing up with R’ by Dr. Changyou Sun are included. Marginal effects for binary or ordered choice models can be calculated. Static and dynamic Almost Ideal Demand System (AIDS) models can be estimated. A typical event analysis in finance can be conducted with several functions included.
719 Econometrics estimatr Fast Estimators for Design-Based Inference Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, difference-in-means, Horvitz-Thompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>.
720 Econometrics expsmooth Data Sets from “Forecasting with Exponential Smoothing” Data sets from the book “Forecasting with exponential smoothing: the state space approach” by Hyndman, Koehler, Ord and Snyder (Springer, 2008).
721 Econometrics ExtremeBounds Extreme Bounds Analysis (EBA) An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer’s and Sala-i-Martin’s versions of EBA, and allows users to customize all aspects of the analysis.
722 Econometrics feisr Estimating Fixed Effects Individual Slope Models Provides the function feis() to estimate fixed effects individual slope (FEIS) models. The FEIS model constitutes a more general version of the often-used fixed effects (FE) panel model, as implemented in the package ‘plm’ by Croissant and Millo (2008) <doi:10.18637/jss.v027.i02>. In FEIS models, data are not only person “demeaned” like in conventional FE models, but “detrended” by the predicted individual slope of each person or group. Estimation is performed by applying least squares lm() to the transformed data. For more details on FEIS models see Bruederl and Ludwig (2015, ISBN:1446252442); Frees (2001) <doi:10.2307/3316008>; Polachek and Kim (1994) <doi:10.1016/0304-4076(94)90075-2>; Wooldridge (2010, ISBN:0262294354). To test consistency of conventional FE and random effects estimators against heterogeneous slopes, the package also provides the functions feistest() for an artificial regression test and bsfeistest() for a bootstrapped version of the Hausman test.
723 Econometrics fma Data Sets from “Forecasting: Methods and Applications” by Makridakis, Wheelwright & Hyndman (1998) All data sets from “Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman (Wiley, 3rd ed., 1998).
724 Econometrics forecast (core) Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
725 Econometrics frm Regression Analysis of Fractional Responses Estimation and specification analysis of one- and two-part fractional regression models and calculation of partial effects.
726 Econometrics frontier Stochastic Frontier Analysis Maximum Likelihood Estimation of Stochastic Frontier Production and Cost Functions. Two specifications are available: the error components specification with time-varying efficiencies (Battese and Coelli, 1992) and a model specification in which the firm effects are directly influenced by a number of variables (Battese and Coelli, 1995).
727 Econometrics fxregime Exchange Rate Regime Analysis Exchange rate regression and structural change tools for estimating, testing, dating, and monitoring (de facto) exchange rate regimes.
728 Econometrics gam Generalized Additive Models Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).
729 Econometrics gamlss Generalised Additive Models for Location Scale and Shape Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.
730 Econometrics geepack Generalized Estimating Equation Package Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses.
731 Econometrics gets General-to-Specific (GETS) Modelling and Indicator Saturation Methods Automated General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.
732 Econometrics glmx Generalized Linear Models Extended Extended techniques for generalized linear models (GLMs), especially for binary responses, including parametric links and heteroskedastic latent variables.
733 Econometrics gmm Generalized Method of Moments and Generalized Empirical Likelihood It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; <doi:10.2307/1392442>) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>).
734 Econometrics gmnl Multinomial Logit Models with Random Parameters An implementation of maximum simulated likelihood method for the estimation of multinomial logit models with random coefficients. Specifically, it allows estimating models with continuous heterogeneity such as the mixed multinomial logit and the generalized multinomial logit. It also allows estimating models with discrete heterogeneity such as the latent class and the mixed-mixed multinomial logit model.
735 Econometrics gravity Estimation Methods for Gravity Models A wrapper of different standard estimation methods for gravity models. This package provides estimation methods for log-log models and multiplicative models.
736 Econometrics gvc Global Value Chains Tools Several tools for Global Value Chain (‘GVC’) analysis are implemented.
737 Econometrics Hmisc Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
738 Econometrics ineq Measuring Inequality, Concentration, and Poverty Inequality, concentration, and poverty measures. Lorenz curves (empirical and theoretical).
739 Econometrics ivfixed Instrumental fixed effect panel data model Fit an Instrumental least square dummy variable model
740 Econometrics ivpack Instrumental Variable Estimation This package contains functions for carrying out instrumental variable estimation of causal effects and power analyses for instrumental variable studies.
741 Econometrics ivpanel Instrumental Panel Data Models Fit the instrumental panel data models: the fixed effects, random effects and between models.
742 Econometrics ivprobit Instrumental Variables Probit Model Compute the instrumental variables probit model using the Amemiya’s Generalized Least Squares estimators (Amemiya, Takeshi, (1978) <doi:10.2307/1911443>).
743 Econometrics LARF Local Average Response Functions for Instrumental Variable Estimation of Treatment Effects Provides instrumental variable estimation of treatment effects when both the endogenous treatment and its instrument are binary. Applicable to both binary and continuous outcomes.
744 Econometrics lavaan Latent Variable Analysis Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
745 Econometrics lfe Linear Group Fixed Effects Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multi-way clustered standard errors, as well as limited mobility bias correction.
746 Econometrics LinRegInteractive Interactive Interpretation of Linear Regression Models Interactive visualization of effects, response functions and marginal effects for different kinds of regression models. In this version linear regression models, generalized linear models, generalized additive models and linear mixed-effects models are supported. Major features are the interactive approach and the handling of the effects of categorical covariates: if two or more factors are used as covariates every combination of the levels of each factor is treated separately. The automatic calculation of marginal effects and a number of possibilities to customize the graphical output are useful features as well.
747 Econometrics lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
748 Econometrics lmtest (core) Testing Linear Regression Models A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.
749 Econometrics margins Marginal Effects for Model Objects An R port of Stata’s ‘margins’ command, which can be used to calculate marginal (or partial) effects from model objects.
750 Econometrics MASS Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
751 Econometrics matchingMarkets Analysis of Stable Matchings Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
752 Econometrics Matrix Sparse and Dense Matrix Classes and Methods A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using ‘LAPACK’ and ‘SuiteSparse’ libraries.
753 Econometrics Mcomp Data from the M-Competitions The 1001 time series from the M-competition (Makridakis et al. 1982) <doi:10.1002/for.3980010202> and the 3003 time series from the IJF-M3 competition (Makridakis and Hibon, 2000) <doi:10.1016/S0169-2070(00)00057-1>.
754 Econometrics meboot Maximum Entropy Bootstrap for Time Series Maximum entropy density based dependent data bootstrap. An algorithm is provided to create a population of time series (ensemble) without assuming stationarity. The reference paper (Vinod, H.D., 2004) explains how the algorithm satisfies the ergodic theorem and the central limit theorem.
755 Econometrics mgcv Mixed GAM Computation Vehicle with Automatic Smoothness Estimation Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.
756 Econometrics micEcon Microeconomic Analysis and Modelling Various tools for microeconomic analysis and microeconomic modelling, e.g. estimating quadratic, Cobb-Douglas and Translog functions, calculating partial derivatives and elasticities of these functions, and calculating Hessian matrices, checking curvature and preparing restrictions for imposing monotonicity of Translog functions.
757 Econometrics micEconAids Demand Analysis with the Almost Ideal Demand System (AIDS) Functions and tools for analysing consumer demand with the Almost Ideal Demand System (AIDS) suggested by Deaton and Muellbauer (1980).
758 Econometrics micEconCES Analysis with the Constant Elasticity of Substitution (CES) function Tools for economic analysis and economic modelling with a Constant Elasticity of Substitution (CES) function
759 Econometrics micEconSNQP Symmetric Normalized Quadratic Profit Function Production analysis with the Symmetric Normalized Quadratic (SNQ) profit function
760 Econometrics midasr Mixed Data Sampling Regression Methods and tools for mixed frequency time series data analysis. Allows estimation, model selection and forecasting for MIDAS regressions.
761 Econometrics mlogit Multinomial Logit Models Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.
762 Econometrics MNP R Package for Fitting the Multinomial Probit Model Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. <doi:10.18637/jss.v014.i03>.
763 Econometrics multiwayvcov Multi-Way Standard Error Clustering Exports two functions implementing multi-way clustering using the method suggested by Cameron, Gelbach, & Miller (2011) and cluster (or block) bootstrapping for estimating variance-covariance matrices. Normal one and two-way clustering matches the results of other common statistical packages. Missing values are handled transparently and rudimentary parallelization support is provided.
764 Econometrics mvProbit Multivariate Probit Models Tools for estimating multivariate probit models, calculating conditional and unconditional expectations, and calculating marginal effects on conditional and unconditional expectations.
765 Econometrics nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
766 Econometrics nnet Feed-Forward Neural Networks and Multinomial Log-Linear Models Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.
767 Econometrics nonnest2 Tests of Non-Nested Models Testing non-nested models via theory supplied by Vuong (1989) <doi:10.2307/1912557>. Includes tests of model distinguishability and of model fit that can be applied to both nested and non-nested models. Also includes functionality to obtain confidence intervals associated with AIC and BIC. This material is partially based on work supported by the National Science Foundation under Grant Number SES-1061334.
768 Econometrics np Nonparametric Kernel Smoothing Methods for Mixed Data Types Nonparametric (and semiparametric) kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types. We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC, <http://www.nserc-crsng.gc.ca>), the Social Sciences and Humanities Research Council of Canada (SSHRC, <http://www.sshrc-crsh.gc.ca>), and the Shared Hierarchical Academic Research Computing Network (SHARCNET, <http://www.sharcnet.ca>).
769 Econometrics nse Numerical Standard Errors Computation in R Collection of functions designed to calculate numerical standard error (NSE) of univariate time series as described in Ardia et al. (2018) <doi:10.2139/ssrn.2741587> and Ardia and Bluteau (2017) <doi:10.21105/joss.00172>.
770 Econometrics ordinal Regression Models for Ordinal Data Implementation of cumulative link (mixed) models also known as ordered regression models, proportional odds models, proportional hazards models for grouped survival times and ordered logit/probit/… models. Estimation is via maximum likelihood and mixed models are fitted with the Laplace approximation and adaptive Gauss-Hermite quadrature. Multiple random effect terms are allowed and they may be nested, crossed or partially nested/crossed. Restrictions of symmetry and equidistance can be imposed on the thresholds (cut-points/intercepts). Standard model methods are available (summary, anova, drop-methods, step, confint, predict etc.) in addition to profile methods and slice methods for visualizing the likelihood function and checking convergence.
771 Econometrics OrthoPanels Dynamic Panel Models with Orthogonal Reparameterization of Fixed Effects Implements the orthogonal reparameterization approach recommended by Lancaster (2002) to estimate dynamic panel models with fixed effects (and optionally: panel specific intercepts). The approach uses a likelihood-based estimator and produces estimates that are asymptotically unbiased as N goes to infinity, with a T as low as 2.
772 Econometrics pampe Implementation of the Panel Data Approach Method for Program Evaluation Implements the Panel Data Approach Method for program evaluation as developed in Hsiao, Ching and Ki Wan (2012). pampe estimates the effect of an intervention by comparing the evolution of the outcome for a unit affected by an intervention or treatment to the evolution of the unit had it not been affected by the intervention.
773 Econometrics panelAR Estimation of Linear AR(1) Panel Data Models with Cross-Sectional Heteroskedasticity and/or Correlation The package estimates linear models on panel data structures in the presence of AR(1)-type autocorrelation as well as panel heteroskedasticity and/or contemporaneous correlation. First, AR(1)-type autocorrelation is addressed via a two-step Prais-Winsten feasible generalized least squares (FGLS) procedure, where the autocorrelation coefficients may be panel-specific. A number of common estimators for the autocorrelation coefficient are supported. In case of panel heteroskedasticty, one can choose to use a sandwich-type robust standard error estimator with OLS or a panel weighted least squares estimator after the two-step Prais-Winsten estimator. Alternatively, if panels are both heteroskedastic and contemporaneously correlated, the package supports panel-corrected standard errors (PCSEs) as well as the Parks-Kmenta FGLS estimator.
774 Econometrics Paneldata Linear models for panel data Linear models for panel data: the fixed effect model and the random effect model
775 Econometrics panelr Regression Models and Utilities for Repeated Measures and Panel Data Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the “within-between” (also known as “between-within” and “hybrid”) panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via ‘Stan’. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE.
776 Econometrics panelvar Panel Vector Autoregression We extend two general methods of moment estimators to panel vector autoregression models (PVAR) with p lags of endogenous variables, predetermined and strictly exogenous variables. This general PVAR model contains the first difference GMM estimator by Holtz-Eakin et al. (1988) <doi:10.2307/1913103>, Arellano and Bond (1991) <doi:10.2307/2297968> and the system GMM estimator by Blundell and Bond (1998) <doi:10.1016/S0304-4076(98)00009-8>. We also provide specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions.
777 Econometrics PANICr PANIC Tests of Nonstationarity A methodology that makes use of the factor structure of large dimensional panels to understand the nature of nonstationarity inherent in data. This is referred to as PANIC, Panel Analysis of Nonstationarity in Idiosyncratic and Common Components. PANIC (2004)<doi:10.1111/j.1468-0262.2004.00528.x> includes valid pooling methods that allow panel tests to be constructed. PANIC (2004) can detect whether the nonstationarity in a series is pervasive, or variable specific, or both. PANIC (2010) <doi:10.1017/s0266466609990478> includes two new tests on the idiosyncratic component that estimates the pooled autoregressive coefficient and sample moment, respectively. The PANIC model approximates the number of factors based on Bai and Ng (2002) <doi:10.1111/1468-0262.00273>.
778 Econometrics pco Panel Cointegration Tests Computation of the Pedroni (1999) panel cointegration test statistics. Reported are the empirical and the standardized values.
779 Econometrics pcse Panel-Corrected Standard Error Estimation in R A function to estimate panel-corrected standard errors. Data may contain balanced or unbalanced panels.
780 Econometrics pder Panel Data Econometrics with R Data sets for the Panel Data Econometrics with R <doi:10.1002/9781119504641> book.
781 Econometrics pdR Threshold Model and Unit Root Tests in Cross-Section and Time Series Data Threshold model, panel version of Hylleberg et al. (1990) <doi:10.1016/0304-4076(90)90080-D> seasonal unit root tests, and panel unit root test of Chang (2002) <doi:10.1016/S0304-4076(02)00095-7>.
782 Econometrics pglm Panel Generalized Linear Models Estimation of panel models for glm-like models: this includes binomial models (logit and probit) count models (poisson and negbin) and ordered models (logit and probit).
783 Econometrics phtt Panel Data Analysis with Heterogeneous Time Trends The package provides estimation procedures for panel data with large dimensions n, T, and general forms of unobservable heterogeneous effects. Particularly, the estimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), which complement one another very well: both models assume the unobservable heterogeneous effects to have a factor structure. The method of Bai (2009) assumes that the factors are stationary, whereas the method of Kneip et al. (2012) allows the factors to be non-stationary. Additionally, the ‘phtt’ package provides a wide range of dimensionality criteria in order to estimate the number of the unobserved factors simultaneously with the remaining model parameters.
784 Econometrics plm (core) Linear Models for Panel Data A set of estimators and tests for panel data econometrics, as described in Baltagi (2013) Econometric Analysis of Panel Data, ISBN-13:978-1-118-67232-7, Hsiao (2014) Analysis of Panel Data <doi:10.1017/CBO9781139839327> and Croissant and Millo (2018), Panel Data Econometrics with R, ISBN:978-1-118-94918-4.
785 Econometrics pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
786 Econometrics psidR Build Panel Data Sets from PSID Raw Data Makes it easy to build panel data in wide format from Panel Survey of Income Dynamics (‘PSID’) delivered raw data. Downloads data directly from the PSID server using the ‘SAScii’ package. ‘psidR’ takes care of merging data from each wave onto a cross-period index file, so that individuals can be followed over time. The user must specify which years they are interested in, and the ‘PSID’ variable names (e.g. ER21003) for each year (they differ in each year). The package offers helper functions to retrieve variable names from different waves. There are different panel data designs and sample subsetting criteria implemented (“SRC”, “SEO”, “immigrant” and “latino” samples).
787 Econometrics PSTR Panel Smooth Transition Regression Modelling Provides the Panel Smooth Transition Regression (PSTR) modelling. The modelling procedure consists of three stages: Specification, Estimation and Evaluation. The package offers sharp tools helping the package user(s) to conduct model specification tests, to do PSTR model estimation, and to do model evaluation. The tests implemented in the package allow for cluster-dependency and are heteroskedasticity-consistent. The wild bootstrap and wild cluster bootstrap tests are also implemented. Parallel computation (as an option) is implemented in some functions, especially the bootstrap tests. The package suits tasks running many cores on super-computation servers.
788 Econometrics pwt Penn World Table (Versions 5.6, 6.x, 7.x) The Penn World Table provides purchasing power parity and national income accounts converted to international prices for 189 countries for some or all of the years 1950-2010.
789 Econometrics pwt8 Penn World Table (Version 8.x) The Penn World Table 8.x provides information on relative levels of income, output, inputs, and productivity for 167 countries between 1950 and 2011.
790 Econometrics pwt9 Penn World Table (Version 9.x) The Penn World Table 9.x (<http://www.ggdc.net/pwt/>) provides information on relative levels of income, output, inputs, and productivity for 182 countries between 1950 and 2017.
791 Econometrics quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
792 Econometrics Rchoice Discrete Choice (Binary, Poisson and Ordered) Models with Random Parameters An implementation of simulated maximum likelihood method for the estimation of Binary (Probit and Logit), Ordered (Probit and Logit) and Poisson models with random parameters for cross-sectional and longitudinal data.
793 Econometrics rdd Regression Discontinuity Estimation Provides the tools to undertake estimation in Regression Discontinuity Designs. Both sharp and fuzzy designs are supported. Estimation is accomplished using local linear regression. A provided function will utilize Imbens-Kalyanaraman optimal bandwidth calculation. A function is also included to test the assumption of no-sorting effects.
794 Econometrics rddapp Regression Discontinuity Design Application Estimation of both single- and multiple-assignment Regression Discontinuity Designs (RDDs). Provides both parametric (global) and non-parametric (local) estimation choices for both sharp and fuzzy designs, along with power analysis and assumption checks. Introductions to the underlying logic and analysis of RDDs are in Thistlethwaite, D. L., Campbell, D. T. (1960) <doi:10.1037/h0044319> and Lee, D. S., Lemieux, T. (2010) <doi:10.1257/jel.48.2.281>.
795 Econometrics rddtools Toolbox for Regression Discontinuity Design (‘RDD’) Set of functions for Regression Discontinuity Design (‘RDD’), for data visualisation, estimation and testing.
796 Econometrics rdlocrand Local Randomization Methods for RD Designs The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. Under the local randomization approach, RD designs can be interpreted as randomized experiments inside a window around the cutoff. This package provides tools to perform randomization inference for RD designs under local randomization: rdrandinf() to perform hypothesis testing using randomization inference, rdwinselect() to select a window around the cutoff in which randomization is likely to hold, rdsensitivity() to assess the sensitivity of the results to different window lengths and null hypotheses and rdrbounds() to construct Rosenbaum bounds for sensitivity to unobserved confounders. See Cattaneo, Titiunik and Vazquez-Bare (2016) <https://sites.google.com/site/rdpackages/rdlocrand/Cattaneo-Titiunik-VazquezBare_2016_Stata.pdf> for further methodological details.
797 Econometrics rdmulti Analysis of RD Designs with Multiple Cutoffs or Scores The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The ‘rdmulti’ package provides tools to analyze RD designs with multiple cutoffs or scores: rdmc() estimates pooled and cutoff specific effects for multi-cutoff designs, rdmcplot() draws RD plots for multi-cutoff designs and rdms() estimates effects in cumulative cutoffs or multi-score designs. See Cattaneo, Titiunik and Vazquez-Bare (2018) <https://sites.google.com/site/rdpackages/rdmulti/Cattaneo-Titiunik-VazquezBare_2018_rdmulti.pdf> for further methodological details.
798 Econometrics rdpower Power Calculations for RD Designs The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The ‘rdpower’ package provides tools to perform power and sample size calculations in RD designs: rdpower() calculates the power of an RD design and rdsampsi() calculates the required sample size to achieve a desired power. See Cattaneo, Titiunik and Vazquez-Bare (2018) <https://sites.google.com/site/rdpackages/rdpower/Cattaneo-Titiunik-VazquezBare_2018_Stata.pdf> for further methodological details.
799 Econometrics rdrobust Robust Data-Driven Statistical Inference in Regression-Discontinuity Designs Regression-discontinuity (RD) designs are quasi-experimental research designs popular in social, behavioral and natural sciences. The RD design is usually employed to study the (local) causal effect of a treatment, intervention or policy. This package provides tools for data-driven graphical and analytical statistical inference in RD designs: rdrobust() to construct local-polynomial point estimators and robust confidence intervals for average treatment effects at the cutoff in Sharp, Fuzzy and Kink RD settings, rdbwselect() to perform bandwidth selection for the different procedures implemented, and rdplot() to conduct exploratory data analysis (RD plots).
800 Econometrics reldist Relative Distribution Methods Tools for the comparison of distributions. This includes nonparametric estimation of the relative distribution PDF and CDF and numerical summaries as described in “Relative Distribution Methods in the Social Sciences” by Mark S. Handcock and Martina Morris, Springer-Verlag, 1999, Springer-Verlag, ISBN 0387987789.
801 Econometrics REndo Fitting Linear Models with Endogenous Regressors using Latent Instrumental Variables Fits linear models with endogenous regressor using latent instrumental variable approaches. The methods included in the package are Lewbel’s (1997) <doi:10.2307/2171884> higher moments approach as well as Lewbel’s (2012) <doi:10.1080/07350015.2012.643126> heteroscedasticity approach, Park and Gupta’s (2012) <doi:10.1287/mksc.1120.0718> joint estimation method that uses Gaussian copula and Kim and Frees’s (2007) <doi:10.1007/s11336-007-9008-1> multilevel generalized method of moment approach that deals with endogeneity in a multilevel setting. These are statistical techniques to address the endogeneity problem where no external instrumental variables are needed. Note that with version 2.0.0 sweeping changes were introduced which greatly improve functionality and usability but break backwards compatibility.
802 Econometrics rms Regression Modeling Strategies Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. ‘rms’ is a collection of functions that assist with and streamline modeling. It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the Buckley-James multiple regression model for right-censored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. ‘rms’ works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.
803 Econometrics RSGHB Functions for Hierarchical Bayesian Estimation: A Flexible Approach Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
804 Econometrics rUnemploymentData Data and Functions for USA State and County Unemployment Data Contains data and visualization functions for USA unemployment data. Data comes from the US Bureau of Labor Statistics (BLS). State data is in ?df_state_unemployment and covers 2000-2013. County data is in ?df_county_unemployment and covers 1990-2013. Choropleth maps of the data can be generated with ?state_unemployment_choropleth() and ?county_unemployment_choropleth() respectively.
805 Econometrics sampleSelection Sample Selection Models Two-step and maximum likelihood estimation of Heckman-type sample selection models: standard sample selection models (Tobit-2), endogenous switching regression models (Tobit-5), sample selection models with binary dependent outcome variable, interval regression with sample selection (only ML estimation), and endogenous treatment effects models. These methods are described in the three vignettes that are included in this package and in econometric textbooks such as Greene (2011, Econometric Analysis, 7th edition, Pearson).
806 Econometrics sandwich (core) Robust Covariance Matrix Estimators Model-robust standard error estimators for cross-sectional, time series, clustered, panel, and longitudinal data.
807 Econometrics segmented Regression Models with Break-Points / Change-Points Estimation Given a regression model, segmented ‘updates’ it by adding one or more segmented (i.e., piece-wise linear) relationships. Several variables with multiple breakpoints are allowed. The estimation method is discussed in Muggeo (2003, <doi:10.1002/sim.1545>) and illustrated in Muggeo (2008, <https://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf>). An approach for hypothesis testing is presented in Muggeo (2016, <doi:10.1080/00949655.2016.1149855>), and interval estimation for the breakpoint is discussed in Muggeo (2017, <doi:10.1111/anzs.12200>).
808 Econometrics sem Structural Equation Models Functions for fitting general linear structural equation models (with observed and latent variables) using the RAM approach, and for fitting structural equations in observed-variable models by two-stage least squares.
809 Econometrics SemiParSampleSel Semi-Parametric Sample Selection Modelling with Continuous or Discrete Response Routine for fitting continuous or discrete response copula sample selection models with semi-parametric predictors, including linear and nonlinear effects.
810 Econometrics semsfa Semiparametric Estimation of Stochastic Frontier Models Semiparametric Estimation of Stochastic Frontier Models following a two step procedure: in the first step semiparametric or nonparametric regression techniques are used to relax parametric restrictions of the functional form representing technology and in the second step variance parameters are obtained by pseudolikelihood estimators or by method of moments.
811 Econometrics sfa Stochastic Frontier Analysis Stochastic Frontier Analysis introduced by Aigner, Lovell and Schmidt (1976) and Battese and Coelli (1992, 1995).
812 Econometrics simpleboot Simple Bootstrap Routines Simple bootstrap routines.
813 Econometrics SparseM Sparse Linear Algebra Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.
814 Econometrics spatialprobit Spatial Probit Models Bayesian Estimation of Spatial Probit and Tobit Models.
815 Econometrics spatialreg Spatial Regression Analysis A collection of all the estimation functions for spatial cross-sectional models (on lattice/areal data using spatial weights matrices) contained up to now in ‘spdep’, ‘sphet’ and ‘spse’. These model fitting functions include maximum likelihood methods for cross-sectional models proposed by ‘Cliff’ and ‘Ord’ (1973, ISBN:0850860369) and (1981, ISBN:0850860814), fitting methods initially described by ‘Ord’ (1975) <doi:10.1080/01621459.1975.10480272>. The models are further described by ‘Anselin’ (1988) <doi:10.1007/978-94-015-7799-1>. Spatial two stage least squares and spatial general method of moment models initially proposed by ‘Kelejian’ and ‘Prucha’ (1998) <doi:10.1023/A:1007707430416> and (1999) <doi:10.1111/1468-2354.00027> are provided. Impact methods and MCMC fitting methods proposed by ‘LeSage’ and ‘Pace’ (2009) <doi:10.1201/9781420064254> are implemented for the family of cross-sectional spatial regression models. Methods for fitting the log determinant term in maximum likelihood and MCMC fitting are compared by ‘Bivand et al.’ (2013) <doi:10.1111/gean.12008>, and model fitting methods by ‘Bivand’ and ‘Piras’ (2015) <doi:10.18637/jss.v063.i18>; both of these articles include extensive lists of references. ‘spatialreg’ >= 1.1-* correspond to ‘spdep’ >= 1.1-1, in which the model fitting functions are deprecated and pass through to ‘spatialreg’, but will mask those in ‘spatialreg’. From versions 1.2-*, the functions will be made defunct in ‘spdep’.
816 Econometrics spfrontier Spatial Stochastic Frontier Models A set of tools for estimation of various spatial specifications of stochastic frontier models.
817 Econometrics sphet Estimation of Spatial Autoregressive Models with and without Heteroscedasticity Generalized Method of Moment estimation of Cliff-Ord-type spatial autoregressive models with and without Heteroscedasticity.
818 Econometrics splm Econometric Models for Spatial Panel Data ML and GM estimation and diagnostic testing of econometric models for spatial panel data.
819 Econometrics ssfa Spatial Stochastic Frontier Analysis Spatial Stochastic Frontier Analysis (SSFA) is an original method for controlling the spatial heterogeneity in Stochastic Frontier Analysis (SFA) models, for cross-sectional data, by splitting the inefficiency term into three terms: the first one related to spatial peculiarities of the territory in which each single unit operates, the second one related to the specific production features and the third one representing the error term.
820 Econometrics strucchange Testing, Monitoring, and Dating Structural Changes Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
821 Econometrics survival Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
822 Econometrics systemfit Estimating Systems of Simultaneous Equations Econometric estimation of simultaneous systems of linear and nonlinear equations using Ordinary Least Squares (OLS), Weighted Least Squares (WLS), Seemingly Unrelated Regressions (SUR), Two-Stage Least Squares (2SLS), Weighted Two-Stage Least Squares (W2SLS), and Three-Stage Least Squares (3SLS).
823 Econometrics truncreg Truncated Gaussian Regression Models Estimation of models for truncated Gaussian variables by maximum likelihood.
824 Econometrics tsDyn Nonlinear Time Series Models with Regime Switching Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).
825 Econometrics tseries (core) Time Series Analysis and Computational Finance Time series analysis and computational finance.
826 Econometrics tsfa Time Series Factor Analysis Extraction of Factors from Multivariate Time Series. See ?00tsfa-Intro for more details.
827 Econometrics urca (core) Unit Root and Cointegration Tests for Time Series Data Unit root and cointegration tests encountered in applied econometric analysis are implemented.
828 Econometrics vars VAR Modelling Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.
829 Econometrics VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
830 Econometrics wahc Autocorrelation and Heteroskedasticity Correction in Fixed Effect Panel Data Model Fit the fixed effect panel data model with heteroskedasticity and autocorrelation correction.
831 Econometrics wbstats Programmatic Access to Data and Statistics from the World Bank API Tools for searching and downloading data and statistics from the World Bank Data API (<http://data.worldbank.org/developers/api-overview>) and the World Bank Data Catalog API (<http://data.worldbank.org/developers/data-catalog-api>).
832 Econometrics wooldridge 111 Data Sets from “Introductory Econometrics: A Modern Approach, 6e” by Jeffrey M. Wooldridge Students learning both econometrics and R may find the introduction to both challenging. However, if the text is “Introductory Econometrics: A Modern Approach” by Jeffrey M. Wooldridge, they are in luck! The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have all been compressed to a fraction of their original size and are well documented. Documentation files contain the page numbers of the text where each set is used, the original source, time of publication, and notes suggesting ideas for further exploratory data analysis and research. If one need’s to brush-up on model syntax, a vignette contains R solutions to examples from each chapter of the text. Data sets are from the 6th edition (Wooldridge 2016, ISBN-13: 978-1-305-27010-7), and are backwards compatible with all versions of the text.
833 Econometrics xts eXtensible Time Series Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
834 Econometrics Zelig Everyone’s Statistical Software A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
835 Econometrics zoo (core) S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
836 Econometrics zTree Functions to Import Data from ‘z-Tree’ into R Read ‘.xls’ and ‘.sbj’ files which are written by the Microsoft Windows program ‘z-Tree’. The latter is a software for developing and carrying out economic experiments (see <http://www.ztree.uzh.ch/> for more information).
837 Analysis of Ecological and Environmental Data ade4 (core) Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
838 Analysis of Ecological and Environmental Data amap Another Multidimensional Analysis Package Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).
839 Analysis of Ecological and Environmental Data analogue Analogue and Weighted Averaging Methods for Palaeoecology Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.
840 Analysis of Ecological and Environmental Data aod Analysis of Overdispersed Data Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).
841 Analysis of Ecological and Environmental Data ape Analyses of Phylogenetics and Evolution Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
842 Analysis of Ecological and Environmental Data aqp Algorithms for Quantitative Pedology The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps/>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.
843 Analysis of Ecological and Environmental Data BiodiversityR Package for Community Ecology and Suitability Analysis Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.
844 Analysis of Ecological and Environmental Data boussinesq Analytic Solutions for (ground-water) Boussinesq Equation This package is a collection of R functions implemented from published and available analytic solutions for the One-Dimensional Boussinesq Equation (ground-water). In particular, the function “beq.lin” is the analytic solution of the linearized form of Boussinesq Equation between two different head-based boundary (Dirichlet) conditions; “beq.song” is the non-linear power-series analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.
845 Analysis of Ecological and Environmental Data bReeze Functions for Wind Resource Assessment A collection of functions to analyse, visualize and interpret wind data and to calculate the potential energy production of wind turbines.
846 Analysis of Ecological and Environmental Data CircStats Circular Statistics, from “Topics in Circular Statistics” (2001) Circular Statistics, from “Topics in Circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
847 Analysis of Ecological and Environmental Data circular Circular Statistics Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
848 Analysis of Ecological and Environmental Data cluster (core) “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.
849 Analysis of Ecological and Environmental Data cocorresp Co-Correspondence Analysis Methods Fits predictive and symmetric co-correspondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.
850 Analysis of Ecological and Environmental Data Distance Distance Sampling Detection Function and Abundance Estimation A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a Horvitz-Thompson-like estimator) if survey area information is provided.
851 Analysis of Ecological and Environmental Data diveMove Dive Analysis and Calibration Utilities to represent, visualize, filter, analyse, and summarize time-depth recorder (TDR) data. Miscellaneous functions for handling location data are also provided.
852 Analysis of Ecological and Environmental Data dse Dynamic Systems Estimation (Time Series Package) Tools for multivariate, linear, time-invariant, time series models. This includes ARMA and state-space representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and state-space model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.
853 Analysis of Ecological and Environmental Data DSpat Spatial Modelling for Distance Sampling Data Fits inhomogeneous Poisson process spatial models to line transect sampling data and provides estimate of abundance within a region.
854 Analysis of Ecological and Environmental Data dyn Time Series Regression Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.
855 Analysis of Ecological and Environmental Data dynatopmodel Implementation of the Dynamic TOPMODEL Hydrological Model A native R implementation and enhancement of the Dynamic TOPMODEL semi-distributed hydrological model. Includes some preprocessing, utility and routines for displaying outputs.
856 Analysis of Ecological and Environmental Data dynlm Dynamic Linear Regression Dynamic linear models and time series regression.
857 Analysis of Ecological and Environmental Data e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
858 Analysis of Ecological and Environmental Data earth Multivariate Adaptive Regression Splines Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines” <doi:10.1214/aos/1176347963>. (The term “MARS” is trademarked and thus not used in the name of the package.)
859 Analysis of Ecological and Environmental Data eco Ecological Inference in 2x2 Tables Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the Expectation-Maximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individual-level data can be directly incorporated into the estimation whenever such data are available. Along with in-sample and out-of-sample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.
860 Analysis of Ecological and Environmental Data ecodist Dissimilarity-Based Functions for Ecological Analysis Dissimilarity-based analysis functions including ordination and Mantel test functions, intended for use with spatial and community data.
861 Analysis of Ecological and Environmental Data EcoHydRology A Community Modeling Foundation for Eco-Hydrology Provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex eco-hydrological interactions.
862 Analysis of Ecological and Environmental Data EnvStats Package for Environmental Statistics, Including US EPA Guidance Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <http://www.springer.com/book/9781461484554>).
863 Analysis of Ecological and Environmental Data equivalence Provides Tests and Graphics for Assessing Tests of Equivalence Provides statistical tests and graphics for assessing tests of equivalence. Such tests have similarity as the alternative hypothesis instead of the null. Sample data sets are included.
864 Analysis of Ecological and Environmental Data evd Functions for Extreme Value Distributions Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.
865 Analysis of Ecological and Environmental Data evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
866 Analysis of Ecological and Environmental Data evir Extreme Values in R Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.
867 Analysis of Ecological and Environmental Data extRemes Extreme Value Analysis Functions for performing extreme value analysis.
868 Analysis of Ecological and Environmental Data fast Implementation of the Fourier Amplitude Sensitivity Test (FAST) The Fourier Amplitude Sensitivity Test (FAST) is a method to determine global sensitivities of a model on parameter changes with relatively few model runs. This package implements this sensitivity analysis method.
869 Analysis of Ecological and Environmental Data FD Measuring functional diversity (FD) from multiple traits, and other tools for functional ecology FD is a package to compute different multidimensional FD indices. It implements a distance-based framework to measure FD that allows any number and type of functional traits, and can also consider species relative abundances. It also contains other useful tools for functional ecology.
870 Analysis of Ecological and Environmental Data flexmix Flexible Mixture Modeling A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
871 Analysis of Ecological and Environmental Data forecast Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
872 Analysis of Ecological and Environmental Data fso Fuzzy Set Ordination Fuzzy set ordination is a multivariate analysis used in ecology to relate the composition of samples to possible explanatory variables. While differing in theory and method, in practice, the use is similar to ‘constrained ordination.’ The package contains plotting and summary functions as well as the analyses.
873 Analysis of Ecological and Environmental Data gam Generalized Additive Models Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).
874 Analysis of Ecological and Environmental Data gamair Data for ‘GAMs: An Introduction with R’ Data sets and scripts used in the book ‘Generalized Additive Models: An Introduction with R’, Wood (2006,2017) CRC.
875 Analysis of Ecological and Environmental Data hydroGOF Goodness-of-Fit Functions for Comparison of Simulated and Observed Hydrological Time Series S3 functions implementing both statistical and graphical goodness-of-fit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models. Missing values in observed and/or simulated values can be removed before computations. Comments / questions / collaboration of any kind are very welcomed.
876 Analysis of Ecological and Environmental Data HydroMe R codes for estimating water retention and infiltration model parameters using experimental data This package is version 2 of HydroMe v.1 package. It estimates the parameters in infiltration and water retention models by curve-fitting method. The models considered are those that are commonly used in soil science. It has new models for water retention characteristic curve and debugging of errors in HydroMe v.1
877 Analysis of Ecological and Environmental Data hydroPSO Particle Swarm Optimisation, with Focus on Environmental Models State-of-the-art version of the Particle Swarm Optimisation (PSO) algorithm (SPSO-2011 and SPSO-2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of non-smooth and non-linear functions. However, the main focus of hydroPSO is the calibration of environmental and other real-world models that need to be executed from the system console. hydroPSO is model-independent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to fine-tune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with user-friendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallel-capable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian). See Zambrano-Bigiarini and Rojas (2013) <doi:10.1016/j.envsoft.2013.01.004> for more details.
878 Analysis of Ecological and Environmental Data hydroTSM Time Series Management, Analysis and Interpolation for Hydrological Modelling S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.
879 Analysis of Ecological and Environmental Data Interpol.T Hourly interpolation of multiple temperature daily series Hourly interpolation of daily minimum and maximum temperature series. Carries out interpolation on multiple series ad once. Requires some hourly series for calibration (alternatively can use default calibration table).
880 Analysis of Ecological and Environmental Data ipred Improved Predictors Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
881 Analysis of Ecological and Environmental Data ismev An Introduction to Statistical Modeling of Extreme Values Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.
882 Analysis of Ecological and Environmental Data labdsv (core) Ordination and Multivariate Analysis for Ecology A variety of ordination and community analyses useful in analysis of data sets in community ecology. Includes many of the common ordination methods, with graphical routines to facilitate their interpretation, as well as several novel analyses.
883 Analysis of Ecological and Environmental Data latticeDensity Density Estimation and Nonparametric Regression on Irregular Regions Functions that compute the lattice-based density estimator of Barry and McIntyre, which accounts for point processes in two-dimensional regions with irregular boundaries and holes. The package also implements two-dimensional non-parametric regression for similar regions.
884 Analysis of Ecological and Environmental Data lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
885 Analysis of Ecological and Environmental Data maptree Mapping, pruning, and graphing tree models Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.
886 Analysis of Ecological and Environmental Data marked Mark-Recapture Analysis for Survival and Abundance Estimation Functions for fitting various models to capture-recapture data including mixed-effects Cormack-Jolly-Seber(CJS) and multistate models and the multi-variate state model structure for survival estimation and POPAN structured Jolly-Seber models for abundance estimation. There are also Hidden Markov model (HMM) implementations of CJS and multistate models with and without state uncertainty and a simulation capability for HMM models.
887 Analysis of Ecological and Environmental Data MASS (core) Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
888 Analysis of Ecological and Environmental Data mclust Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
889 Analysis of Ecological and Environmental Data mda Mixture and Flexible Discriminant Analysis Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, …
890 Analysis of Ecological and Environmental Data mefa Multivariate Data Handling in Ecology and Biogeography A framework package aimed to provide standardized computational environment for specialist work via object classes to represent the data coded by samples, taxa and segments (i.e. subpopulations, repeated measures). It supports easy processing of the data along with cross tabulation and relational data tables for samples and taxa. An object of class ‘mefa’ is a project specific compendium of the data and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of ‘mefa’ objects. Reports can be generated in plain text or LaTeX format. Vignette contains worked examples.
891 Analysis of Ecological and Environmental Data metacom Analysis of the ‘Elements of Metacommunity Structure’ Functions to analyze coherence, boundary clumping, and turnover following the pattern-based metacommunity analysis of Leibold and Mikkelson 2002 <doi:10.1034/j.1600-0706.2002.970210.x>. The package also includes functions to visualize ecological networks, and to calculate modularity as a replacement to boundary clumping.
892 Analysis of Ecological and Environmental Data mgcv (core) Mixed GAM Computation Vehicle with Automatic Smoothness Estimation Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.
893 Analysis of Ecological and Environmental Data mrds Mark-Recapture Distance Sampling Animal abundance estimation via conventional, multiple covariate and mark-recapture distance sampling (CDS/MCDS/MRDS). Detection function fitting is performed via maximum likelihood. Also included are diagnostics and plotting for fitted detection functions. Abundance estimation is via a Horvitz-Thompson-like estimator.
894 Analysis of Ecological and Environmental Data nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
895 Analysis of Ecological and Environmental Data nsRFA Non-Supervised Regional Frequency Analysis A collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the index-value method and, more precisely, helps the hydrologist to: (1) regionalize the index-value; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.
896 Analysis of Ecological and Environmental Data oce Analysis of Oceanographic Data Supports the analysis of Oceanographic data, including ‘ADCP’ measurements, measurements made with ‘argo’ floats, ‘CTD’ measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the ‘UNESCO’ or ‘TEOS-10’ equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively in Dan Kelley’s book Oceanographic Analysis with R, published in 2018 by ‘Springer-Verlag’ with ISBN 978-1-4939-8842-6.
897 Analysis of Ecological and Environmental Data openair Tools for the Analysis of Air Pollution Data Tools to analyse, interpret and understand air pollution data. Data are typically hourly time series and both monitoring data and dispersion model output can be analysed. Many functions can also be applied to other data, including meteorological and traffic data.
898 Analysis of Ecological and Environmental Data ouch Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.
899 Analysis of Ecological and Environmental Data party A Laboratory for Recursive Partytioning A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/1471-2105-8-25>.
900 Analysis of Ecological and Environmental Data pastecs Package for Analysis of Space-Time Ecological Series Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>;) initiative to bring PASSTEC 2000 functionalities to R.
901 Analysis of Ecological and Environmental Data pgirmess Spatial Analysis and Data Mining for Field Ecologists Set of tools for reading, writing and transforming spatial and seasonal data in ecology, model selection and specific statistical tests. It includes functions to discretize polylines into regular point intervals, link observations to those points, compute geographical coordinates at regular intervals between waypoints, read subsets of big rasters, compute zonal statistics or table of categories within polygons or circular buffers from raster. The package also provides miscellaneous functions for model selection, spatial statistics, geometries, writing data.frame with Chinese characters, and some other functions for field ecologists.
902 Analysis of Ecological and Environmental Data popbio Construction and Analysis of Matrix Population Models Construct and analyze projection matrix models from a demography study of marked individuals classified by age or stage. The package covers methods described in Matrix Population Models by Caswell (2001) and Quantitative Conservation Biology by Morris and Doak (2002).
903 Analysis of Ecological and Environmental Data prabclus Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview.
904 Analysis of Ecological and Environmental Data pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
905 Analysis of Ecological and Environmental Data pvclust Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) p-value as well as BP (bootstrap probability) value for each cluster in a dendrogram.
906 Analysis of Ecological and Environmental Data qualV Qualitative Validation Methods Qualitative methods for the validation of dynamic models. It contains (i) an orthogonal set of deviance measures for absolute, relative and ordinal scale and (ii) approaches accounting for time shifts. The first approach transforms time to take time delays and speed differences into account. The second divides the time series into interval units according to their main features and finds the longest common subsequence (LCS) using a dynamic programming algorithm.
907 Analysis of Ecological and Environmental Data quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
908 Analysis of Ecological and Environmental Data quantregGrowth Growth Charts via Regression Quantiles Fits non-crossing regression quantiles as a function of linear covariates and multiple smooth terms via B-splines with L1-norm difference penalties. Monotonicity constraints on the fitted curves are allowed. See Muggeo, Sciandra, Tomasello and Calvo (2013) <doi:10.1007/s10651-012-0232-1> and <doi:10.13140/RG.2.2.12924.85122> for some code example.
909 Analysis of Ecological and Environmental Data randomForest Breiman and Cutler’s Random Forests for Classification and Regression Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <doi:10.1023/A:1010933404324>.
910 Analysis of Ecological and Environmental Data Rcapture Loglinear Models for Capture-Recapture Experiments Estimation of abundance and other of demographic parameters for closed populations, open populations and the robust design in capture-recapture experiments using loglinear models.
911 Analysis of Ecological and Environmental Data RMark R Code for Mark Analysis An interface to the software package MARK that constructs input files for MARK and extracts the output. MARK was developed by Gary White and is freely available at <http://www.phidot.org/software/mark/downloads/> but is not open source.
912 Analysis of Ecological and Environmental Data RMAWGEN Multi-Site Auto-Regressive Weather GENerator S3 and S4 functions are implemented for spatial multi-site stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.
913 Analysis of Ecological and Environmental Data rpart Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
914 Analysis of Ecological and Environmental Data rtop Interpolation of Data with Variable Spatial Support Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.
915 Analysis of Ecological and Environmental Data seacarb Seawater Carbonate Chemistry Calculates parameters of the seawater carbonate system and assists the design of ocean acidification perturbation experiments.
916 Analysis of Ecological and Environmental Data seas Seasonal Analysis and Graphics, Especially for Climatology Capable of deriving seasonal statistics, such as “normals”, and analysis of seasonal data, such as departures. This package also has graphics capabilities for representing seasonal data, including boxplots for seasonal parameters, and bars for summed normals. There are many specific functions related to climatology, including precipitation normals, temperature normals, cumulative precipitation departures and precipitation interarrivals. However, this package is designed to represent any time-varying parameter with a discernible seasonal signal, such as found in hydrology and ecology.
917 Analysis of Ecological and Environmental Data secr Spatially Explicit Capture-Recapture Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distance-dependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.
918 Analysis of Ecological and Environmental Data segmented Regression Models with Break-Points / Change-Points Estimation Given a regression model, segmented ‘updates’ it by adding one or more segmented (i.e., piece-wise linear) relationships. Several variables with multiple breakpoints are allowed. The estimation method is discussed in Muggeo (2003, <doi:10.1002/sim.1545>) and illustrated in Muggeo (2008, <https://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf>). An approach for hypothesis testing is presented in Muggeo (2016, <doi:10.1080/00949655.2016.1149855>), and interval estimation for the breakpoint is discussed in Muggeo (2017, <doi:10.1111/anzs.12200>).
919 Analysis of Ecological and Environmental Data sensitivity Global Sensitivity Analysis of Model Outputs A collection of functions for factor screening, global sensitivity analysis and reliability sensitivity analysis. Most of the functions have to be applied on model with scalar output, but several functions support multi-dimensional outputs.
920 Analysis of Ecological and Environmental Data simba A Collection of functions for similarity analysis of vegetation data Besides functions for the calculation of similarity and multiple plot similarity measures with binary data (for instance presence/absence species data) the package contains some simple wrapper functions for reshaping species lists into matrices and vice versa and some other functions for further processing of similarity data (Mantel-like permutation procedures) as well as some other useful stuff for vegetation analysis.
921 Analysis of Ecological and Environmental Data simecol Simulation of Ecological (and Other) Dynamic Systems An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individual-based (or agent-based) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and re-usability of code.
922 Analysis of Ecological and Environmental Data siplab Spatial Individual-Plant Modelling A platform for experimenting with spatially explicit individual-based vegetation models.
923 Analysis of Ecological and Environmental Data soiltexture Functions for Soil Texture Plot, Classification and Transformation “The Soil Texture Wizard” is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, right-angled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().
924 Analysis of Ecological and Environmental Data SpatialExtremes Modelling Spatial Extremes Tools for the statistical modelling of spatial extremes using max-stable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric max-stable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple max-stable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the so-called conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) <doi:10.1214/11-STS376>, Padoan et al. (2010) <doi:10.1198/jasa.2009.tm08577>, Dombry et al. (2013) <doi:10.1093/biomet/ass067>.
925 Analysis of Ecological and Environmental Data StreamMetabolism Calculate Single Station Metabolism from Diurnal Oxygen Curves I provide functions to calculate Gross Primary Productivity, Net Ecosystem Production, and Ecosystem Respiration from single station diurnal Oxygen curves.
926 Analysis of Ecological and Environmental Data strucchange Testing, Monitoring, and Dating Structural Changes Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
927 Analysis of Ecological and Environmental Data surveillance Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hohle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hohle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.
928 Analysis of Ecological and Environmental Data tiger TIme series of Grouped ERrors Temporally resolved groups of typical differences (errors) between two time series are determined and visualized
929 Analysis of Ecological and Environmental Data topmodel Implementation of the Hydrological Model TOPMODEL in R Set of hydrological functions including an R implementation of the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode.
930 Analysis of Ecological and Environmental Data tseries Time Series Analysis and Computational Finance Time series analysis and computational finance.
931 Analysis of Ecological and Environmental Data unmarked Models for Data from Unmarked Animals Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates.
932 Analysis of Ecological and Environmental Data untb Ecological Drift under the UNTB Hubbell’s Unified Neutral Theory of Biodiversity.
933 Analysis of Ecological and Environmental Data vegan (core) Community Ecology Package Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
934 Analysis of Ecological and Environmental Data vegetarian Jost Diversity Measures for Community Data This package computes diversity for community data sets using the methods outlined by Jost (2006, 2007). While there are differing opinions on the ideal way to calculate diversity (e.g. Magurran 2004), this method offers the advantage of providing diversity numbers equivalents, independent alpha and beta diversities, and the ability to incorporate ‘order’ (q) as a continuous measure of the importance of rare species in the metrics. The functions provided in this package largely correspond with the equations offered by Jost in the cited papers. The package computes alpha diversities, beta diversities, gamma diversities, and similarity indices. Confidence intervals for diversity measures are calculated using a bootstrap method described by Chao et al. (2008). For datasets with many samples (sites, plots), sim.table creates tables of all pairwise comparisons possible, and for grouped samples sim.groups calculates pairwise combinations of within- and between-group comparisons.
935 Analysis of Ecological and Environmental Data VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
936 Analysis of Ecological and Environmental Data wasim Visualisation and analysis of output files of the hydrological model WASIM Helpful tools for data processing and visualisation of results of the hydrological model WASIM-ETH.
937 Analysis of Ecological and Environmental Data zoo S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
938 Design of Experiments (DoE) & Analysis of Experimental Data acebayes Optimal Bayesian Experimental Design using the ACE Algorithm Optimal Bayesian experimental design using the approximate coordinate exchange (ACE) algorithm.
939 Design of Experiments (DoE) & Analysis of Experimental Data agricolae (core) Statistical Procedures for Agricultural Research Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
940 Design of Experiments (DoE) & Analysis of Experimental Data agridat Agricultural Datasets Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.
941 Design of Experiments (DoE) & Analysis of Experimental Data AlgDesign (core) Algorithmic Experimental Design Algorithmic experimental designs. Calculates exact and approximate theory experimental designs for D,A, and I criteria. Very large designs may be created. Experimental designs may be blocked or blocked designs created from a candidate list, using several criteria. The blocking can be done when whole and within plot factors interact.
942 Design of Experiments (DoE) & Analysis of Experimental Data ALTopt Optimal Experimental Designs for Accelerated Life Testing Creates the optimal (D, U and I) designs for the accelerated life testing with right censoring or interval censoring. It uses generalized linear model (GLM) approach to derive the asymptotic variance-covariance matrix of regression coefficients. The failure time distribution is assumed to follow Weibull distribution with a known shape parameter and log-linear link functions are used to model the relationship between failure time parameters and stress variables. The acceleration model may have multiple stress factors, although most ALTs involve only two or less stress factors. ALTopt package also provides several plotting functions including contour plot, Fraction of Use Space (FUS) plot and Variance Dispersion graphs of Use Space (VDUS) plot.
943 Design of Experiments (DoE) & Analysis of Experimental Data asd Simulations for Adaptive Seamless Designs Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.
944 Design of Experiments (DoE) & Analysis of Experimental Data BatchExperiments Statistical Experiments on Batch Computing Clusters Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.
945 Design of Experiments (DoE) & Analysis of Experimental Data BayesMAMS Designing Bayesian Multi-Arm Multi-Stage Studies Calculating Bayesian sample sizes for multi-arm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages.
946 Design of Experiments (DoE) & Analysis of Experimental Data bcrm Bayesian Continual Reassessment Method for Phase I Dose-Escalation Trials Implements a wide variety of one- and two-parameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics. See Sweeting et al. (2013): <doi:10.18637/jss.v054.i13>.
947 Design of Experiments (DoE) & Analysis of Experimental Data BHH2 Useful Functions for Box, Hunter and Hunter II Functions and data sets reproducing some examples in Box, Hunter and Hunter II. Useful for statistical design of experiments, especially factorial experiments.
948 Design of Experiments (DoE) & Analysis of Experimental Data binseqtest Exact Binary Sequential Designs and Analysis For a series of binary responses, create stopping boundary with exact results after stopping, allowing updating for missing assessments.
949 Design of Experiments (DoE) & Analysis of Experimental Data bioOED Sensitivity Analysis and Optimum Experiment Design for Microbial Inactivation Extends the bioinactivation package with functions for Sensitivity Analysis and Optimum Experiment Design.
950 Design of Experiments (DoE) & Analysis of Experimental Data blocksdesign Nested and Crossed Block Designs for Factorial, Fractional Factorial and Unstructured Treatment Sets Constructs D-optimal or near D-optimal nested and crossed block designs for unstructured or general factorial treatment designs. The treatment design, if required, is found from a model matrix design formula and can be added sequentially, if required. The block design is found from a defined set of block factors and is conditional on the defined treatment design. The block factors are added in sequence and each added block factor is optimized conditional on all previously added block factors. The block design can have repeated nesting down to any required depth of nesting with either simple nested blocks or a crossed blocks design at each level of nesting. Outputs include a table showing the allocation of treatments to blocks and tables showing the achieved D-efficiency factors for each block and treatment design.
951 Design of Experiments (DoE) & Analysis of Experimental Data blockTools Block, Assign, and Diagnose Potential Interference in Randomized Experiments Blocks units into experimental blocks, with one unit per treatment condition, by creating a measure of multivariate distance between all possible pairs of units. Maximum, minimum, or an allowable range of differences between units on one variable can be set. Randomly assign units to treatment conditions. Diagnose potential interference between units assigned to different treatment conditions. Write outputs to .tex and .csv files.
952 Design of Experiments (DoE) & Analysis of Experimental Data BOIN Bayesian Optimal INterval (BOIN) Design for Single-Agent and Drug- Combination Phase I Clinical Trials The Bayesian optimal interval (BOIN) design is a novel phase I clinical trial design for finding the maximum tolerated dose (MTD). It can be used to design both single-agent and drug-combination trials. The BOIN design is motivated by the top priority and concern of clinicians when testing a new drug, which is to effectively treat patients and minimize the chance of exposing them to subtherapeutic or overly toxic doses. The prominent advantage of the BOIN design is that it achieves simplicity and superior performance at the same time. The BOIN design is algorithm-based and can be implemented in a simple way similar to the traditional 3+3 design. The BOIN design yields an average performance that is comparable to that of the continual reassessment method (CRM, one of the best model-based designs) in terms of selecting the MTD, but has a substantially lower risk of assigning patients to subtherapeutic or overly toxic doses.
953 Design of Experiments (DoE) & Analysis of Experimental Data BsMD Bayes Screening and Model Discrimination Bayes screening and model discrimination follow-up designs.
954 Design of Experiments (DoE) & Analysis of Experimental Data choiceDes Design Functions for Choice Studies Design functions for DCMs and other types of choice studies (including MaxDiff and other tradeoffs).
955 Design of Experiments (DoE) & Analysis of Experimental Data CombinS Construction Methods of some Series of PBIB Designs Series of partially balanced incomplete block designs (PBIB) based on the combinatory method (S) introduced in (Imane Rezgui et al, 2014) <doi:10.3844/jmssp.2014.45.48>; and it gives their associated U-type design.
956 Design of Experiments (DoE) & Analysis of Experimental Data conf.design (core) Construction of factorial designs This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.
957 Design of Experiments (DoE) & Analysis of Experimental Data crmPack Object-Oriented Implementation of CRM Designs Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.
958 Design of Experiments (DoE) & Analysis of Experimental Data crossdes (core) Construction of Crossover Designs Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance.
959 Design of Experiments (DoE) & Analysis of Experimental Data Crossover Analysis and Search of Crossover Designs Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them.
960 Design of Experiments (DoE) & Analysis of Experimental Data dae Functions Useful in the Design and ANOVA of Experiments The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called ‘DesignNotes’. The ANOVA functions facilitate the extraction of information when the ‘Error’ function has been used in the call to ‘aov’. The package ‘dae’ can also be installed from <http://chris.brien.name/rpackages/>.
961 Design of Experiments (DoE) & Analysis of Experimental Data daewr Design and Analysis of Experiments with R Contains Data frames and functions used in the book “Design and Analysis of Experiments with R”.
962 Design of Experiments (DoE) & Analysis of Experimental Data designGG Computational tool for designing genetical genomics experiments The package provides R scripts for designing genetical genomics experiments.
963 Design of Experiments (DoE) & Analysis of Experimental Data designGLMM Finding Optimal Block Designs for a Generalised Linear Mixed Model Use simulated annealing to find optimal designs for Poisson regression models with blocks.
964 Design of Experiments (DoE) & Analysis of Experimental Data designmatch Matched Samples that are Balanced and Representative by Design Includes functions for the construction of matched samples that are balanced and representative by design. Among others, these functions can be used for matching in observational studies with treated and control units, with cases and controls, in related settings with instrumental variables, and in discontinuity designs. Also, they can be used for the design of randomized experiments, for example, for matching before randomization. By default, ‘designmatch’ uses the ‘GLPK’ optimization solver, but its performance is greatly enhanced by the ‘Gurobi’ optimization solver and its associated R interface. For their installation, please follow the instructions at <http://user.gurobi.com/download/gurobi-optimizer> and <http://www.gurobi.com/documentation/7.0/refman/r_api_overview.html>. We have also included directions in the gurobi_installation file in the inst folder.
965 Design of Experiments (DoE) & Analysis of Experimental Data desirability Function Optimization and Ranking via Desirability Functions S3 classes for multivariate optimization using the desirability function by Derringer and Suich (1980).
966 Design of Experiments (DoE) & Analysis of Experimental Data desplot Plotting Field Plans for Agricultural Experiments A function for plotting maps of agricultural field experiments that are laid out in grids.
967 Design of Experiments (DoE) & Analysis of Experimental Data dfcomb Phase I/II Adaptive Dose-Finding Design for Combination Studies Phase I/II adaptive dose-finding design for combination studies where toxicity rates are supposed to increase with both agents.
968 Design of Experiments (DoE) & Analysis of Experimental Data dfcrm Dose-Finding by the Continual Reassessment Method Provides functions to run the CRM and TITE-CRM in phase I trials and calibration tools for trial planning purposes.
969 Design of Experiments (DoE) & Analysis of Experimental Data dfmta Phase I/II Adaptive Dose-Finding Design for MTA Phase I/II adaptive dose-finding design for single-agent Molecularly Targeted Agent (MTA), according to the paper “Phase I/II Dose-Finding Design for Molecularly Targeted Agent: Plateau Determination using Adaptive Randomization”, Riviere Marie-Karelle et al. (2016) <doi:10.1177/0962280216631763>.
970 Design of Experiments (DoE) & Analysis of Experimental Data dfpk Bayesian Dose-Finding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods, proposed by Ursino et al, (2017) <doi:10.1002/bimj.201600084>, enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).
971 Design of Experiments (DoE) & Analysis of Experimental Data DiceDesign Designs of Computer Experiments Space-Filling Designs and Uniformity Criteria.
972 Design of Experiments (DoE) & Analysis of Experimental Data DiceEval Construction and Evaluation of Metamodels Estimation, validation and prediction of models of different types : linear models, additive models, MARS,PolyMARS and Kriging.
973 Design of Experiments (DoE) & Analysis of Experimental Data DiceKriging Kriging Methods for Computer Experiments Estimation, validation and prediction of kriging models. Important functions : km, print.km, plot.km, predict.km.
974 Design of Experiments (DoE) & Analysis of Experimental Data DiceView Plot Methods for Computer Experiments Design and Surrogate View 2D/3D sections or contours of computer experiments designs, surrogates or test functions.
975 Design of Experiments (DoE) & Analysis of Experimental Data docopulae Optimal Designs for Copula Models A direct approach to optimal designs for copula models based on the Fisher information. Provides flexible functions for building joint PDFs, evaluating the Fisher information and finding optimal designs. It includes an extensible solution to summation and integration called ‘nint’, functions for transforming, plotting and comparing designs, as well as a set of tools for common low-level tasks.
976 Design of Experiments (DoE) & Analysis of Experimental Data DoE.base (core) Full Factorials, Orthogonal Arrays and Base Utilities for DoE Packages Creates full factorial experimental designs and designs based on orthogonal arrays for (industrial) experiments. Provides diverse quality criteria. Provides utility functions for the class design, which is also used by other packages for designed experiments.
977 Design of Experiments (DoE) & Analysis of Experimental Data DoE.MIParray Creation of Arrays by Mixed Integer Programming ‘CRAN’ packages ‘DoE.base’ and ‘Rmosek’ and non-‘CRAN’ package ‘gurobi’ are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. Optimization requires the availability of at least one of the commercial products ‘Gurobi’ or ‘Mosek’ (free academic licenses available for both). For installing ‘Gurobi’ and its R package ‘gurobi’, follow instructions at <http://www.gurobi.com/downloads/gurobi-optimizer> and <http://www.gurobi.com/documentation/7.5/refman/r_api_overview.html> (or higher version). For installing ‘Mosek’ and its R package ‘Rmosek’, follow instructions at <https://www.mosek.com/downloads/> and <http://docs.mosek.com/8.1/rmosek/install-interface.html>, or use the functionality in the stump CRAN R package ‘Rmosek’.
978 Design of Experiments (DoE) & Analysis of Experimental Data DoE.wrapper (core) Wrapper Package for Design of Experiments Functionality Various kinds of designs for (industrial) experiments can be created. The package uses, and sometimes enhances, design generation routines from other packages. So far, response surface designs from package rsm, latin hypercube samples from packages lhs and DiceDesign, and D-optimal designs from package AlgDesign have been implemented.
979 Design of Experiments (DoE) & Analysis of Experimental Data DoseFinding Planning and Analyzing Dose Finding Experiments The DoseFinding package provides functions for the design and analysis of dose-finding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting non-linear dose-response models (using Bayesian and non-Bayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.
980 Design of Experiments (DoE) & Analysis of Experimental Data dynaTree Dynamic Trees for Learning and Design Inference by sequential Monte Carlo for dynamic tree regression and classification models with hooks provided for sequential design and optimization, fully online learning with drift, variable selection, and sensitivity analysis of inputs. Illustrative examples from the original dynamic trees paper are facilitated by demos in the package; see demo(package=“dynaTree”).
981 Design of Experiments (DoE) & Analysis of Experimental Data easypower Sample Size Estimation for Experimental Designs Power analysis is used in the estimation of sample sizes for experimental designs. Most programs and R packages will only output the highest recommended sample size to the user. Often the user input can be complicated and computing multiple power analyses for different treatment comparisons can be time consuming. This package simplifies the user input and allows the user to view all of the sample size recommendations or just the ones they want to see. The calculations used to calculate the recommended sample sizes are from the ‘pwr’ package.
982 Design of Experiments (DoE) & Analysis of Experimental Data edesign Maximum Entropy Sampling An implementation of maximum entropy sampling for spatial data is provided. An exact branch-and-bound algorithm as well as greedy and dual greedy heuristics are included.
983 Design of Experiments (DoE) & Analysis of Experimental Data EngrExpt Data sets from “Introductory Statistics for Engineering Experimentation” Datasets from Nelson, Coffin and Copeland “Introductory Statistics for Engineering Experimentation” (Elsevier, 2003) with sample code.
984 Design of Experiments (DoE) & Analysis of Experimental Data experiment R Package for Designing and Analyzing Randomized Experiments Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomized-block and matched-pair designs based on possibly multivariate pre-treatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, two-stage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.
985 Design of Experiments (DoE) & Analysis of Experimental Data ez Easy Analysis and Visualization of Factorial Experiments Facilitates easy analysis of factorial experiments, including purely within-Ss designs (a.k.a. “repeated measures”), purely between-Ss designs, and mixed within-and-between-Ss designs. The functions in this package aim to provide simple, intuitive and consistent specification of data analysis and visualization. Visualization functions also include design visualization for pre-analysis data auditing, and correlation matrix visualization. Finally, this package includes functions for non-parametric analysis, including permutation tests and bootstrap resampling. The bootstrap function obtains predictions either by cell means or by more advanced/powerful mixed effects models, yielding predictions and confidence intervals that may be easily visualized at any level of the experiment’s design.
986 Design of Experiments (DoE) & Analysis of Experimental Data FMC Factorial Experiments with Minimum Level Changes Generate cost effective minimally changed run sequences for symmetrical as well as asymmetrical factorial designs.
987 Design of Experiments (DoE) & Analysis of Experimental Data FrF2 (core) Fractional Factorial Designs with 2-Level Factors Regular and non-regular Fractional Factorial 2-level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2-level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the built-in function alias).
988 Design of Experiments (DoE) & Analysis of Experimental Data FrF2.catlg128 Catalogues of resolution IV 128 run 2-level fractional factorials up to 33 factors that do have 5-letter words This package provides catalogues of resolution IV regular fractional factorial designs in 128 runs for up to 33 2-level factors. The catalogues are complete, excluding resolution IV designs without 5-letter words, because these do not add value for a search for clear designs. The previous package version 1.0 with complete catalogues up to 24 runs (24 runs and a namespace added later) can be downloaded from the authors website.
989 Design of Experiments (DoE) & Analysis of Experimental Data GAD GAD: Analysis of variance from general principles This package analyses complex ANOVA models with any combination of orthogonal/nested and fixed/random factors, as described by Underwood (1997). There are two restrictions: (i) data must be balanced; (ii) fixed nested factors are not allowed. Homogeneity of variances is checked using Cochran’s C test and ‘a posteriori’ comparisons of means are done using Student-Newman-Keuls (SNK) procedure.
990 Design of Experiments (DoE) & Analysis of Experimental Data geospt Geostatistical Analysis and Design of Optimal Spatial Sampling Networks Estimation of the variogram through trimmed mean, radial basis functions (optimization, prediction and cross-validation), summary statistics from cross-validation, pocket plot, and design of optimal sampling networks through sequential and simultaneous points methods.
991 Design of Experiments (DoE) & Analysis of Experimental Data granova Graphical Analysis of Variance This small collection of functions provides what we call elemental graphics for display of anova results. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular anova methods. The two main functions are granova.1w (a graphic for one way anova) and granova.2w (a corresponding graphic for two way anova). These functions were written to display data for any number of groups, regardless of their sizes (however, very large data sets or numbers of groups can be problematic). For these two functions a specialized approach is used to construct data-based contrast vectors for which anova data are displayed. The result is that the graphics use straight lines, and when appropriate flat surfaces, to facilitate clear interpretations while being faithful to the standard effect tests in anova. The graphic results are complementary to standard summary tables for these two basic kinds of analysis of variance; numerical summary results of analyses are also provided as side effects. Two additional functions are granova.ds (for comparing two dependent samples), and granova.contr (which provides graphic displays for a priori contrasts). All functions provide relevant numerical results to supplement the graphic displays of anova data. The graphics based on these functions should be especially helpful for learning how the methods have been applied to answer the question(s) posed. This means they can be particularly helpful for students and non-statistician analysts. But these methods should be quite generally helpful for work-a-day applications of all kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of non-linear transformations of data. In the case of granova.1w and granova.ds especially, several arguments are provided to facilitate flexibility in the construction of graphics that accommodate diverse features of data, according to their corresponding display requirements. See the help files for individual functions.
992 Design of Experiments (DoE) & Analysis of Experimental Data GroupSeq A GUI-Based Program to Compute Probabilities Regarding Group Sequential Designs A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by Lan-DeMets with various alpha spending functions being available to choose among.
993 Design of Experiments (DoE) & Analysis of Experimental Data gsbDesign Group Sequential Bayes Design Group Sequential Operating Characteristics for Clinical, Bayesian two-arm Trials with known Sigma and Normal Endpoints.
994 Design of Experiments (DoE) & Analysis of Experimental Data gsDesign Group Sequential Design Derives group sequential designs and describes their properties.
995 Design of Experiments (DoE) & Analysis of Experimental Data gset Group Sequential Design in Equivalence Studies calculate equivalence and futility boundaries based on the exact bivariate \(t\) test statistics for group sequential designs in studies with equivalence hypotheses.
996 Design of Experiments (DoE) & Analysis of Experimental Data hiPOD hierarchical Pooled Optimal Design Based on hierarchical modeling, this package provides a few practical functions to find and present the optimal designs for a pooled NGS design.
997 Design of Experiments (DoE) & Analysis of Experimental Data ibd Incomplete Block Designs A collection of several utility functions related to binary incomplete block designs. The package contains function to generate A- and D-efficient binary incomplete block designs with given numbers of treatments, number of blocks and block size. The package also contains function to generate an incomplete block design with specified concurrence matrix. There are functions to generate balanced treatment incomplete block designs and incomplete block designs for test versus control treatments comparisons with specified concurrence matrix. Package also allows performing analysis of variance of data and computing estimated marginal means of factors from experiments using a connected incomplete block design. Tests of hypothesis of treatment contrasts in incomplete block design set up is supported.
998 Design of Experiments (DoE) & Analysis of Experimental Data ICAOD Optimal Designs for Nonlinear Models Finds optimal designs for nonlinear models using a metaheuristic algorithm called imperialist competitive algorithm ICA. See, for details, Masoudi et al. (2017) <doi:10.1016/j.csda.2016.06.014> and Masoudi et al. (2019) <doi:10.1080/10618600.2019.1601097>.
999 Design of Experiments (DoE) & Analysis of Experimental Data idefix Efficient Designs for Discrete Choice Experiments Generates efficient designs for discrete choice experiments based on the multinomial logit model, and individually adapted designs for the mixed multinomial logit model. The generated designs can be presented on screen and choice data can be gathered using a shiny application. Crabbe M, Akinc D and Vandebroek M (2014) <doi:10.1016/j.trb.2013.11.008>.
1000 Design of Experiments (DoE) & Analysis of Experimental Data JMdesign Joint Modeling of Longitudinal and Survival Data - Power Calculation Performs power calculations for joint modeling of longitudinal and survival data with k-th order trajectories when the variance-covariance matrix, Sigma_theta, is unknown.
1001 Design of Experiments (DoE) & Analysis of Experimental Data LDOD Finding Locally D-optimal optimal designs for some nonlinear and generalized linear models this package provides functions for Finding Locally D-optimal designs for Logistic, Negative Binomial, Poisson, Michaelis-Menten, Exponential, Log-Linear, Emax, Richards, Weibull and Inverse Quadratic regression models and also functions for auto-constructing Fisher information matrix and Frechet derivative based on some input variables and without user-interfere.
1002 Design of Experiments (DoE) & Analysis of Experimental Data lhs Latin Hypercube Samples Provides a number of methods for creating and augmenting Latin Hypercube Samples.
1003 Design of Experiments (DoE) & Analysis of Experimental Data MAMS Designing Multi-Arm Multi-Stage Studies Designing multi-arm multi-stage studies with (asymptotically) normal endpoints and known variance.
1004 Design of Experiments (DoE) & Analysis of Experimental Data MaxPro Maximum Projection Designs Generate maximum projection (MaxPro) designs for quantitative and/or qualitative factors. Details of the MaxPro criterion can be found in: (1) Joseph, Gul, and Ba. (2015) “Maximum Projection Designs for Computer Experiments”, Biometrika, 102, 371-380, and (2) Joseph, Gul, and Ba. (2018) “Designing Computer Experiments with Multiple Types of Factors: The MaxPro Approach”, Journal of Quality Technology, to appear.
1005 Design of Experiments (DoE) & Analysis of Experimental Data MBHdesign Spatial Designs for Ecological and Environmental Surveys Provides spatially balanced designs from a set of (contiguous) potential sampling locations in a study region. Accommodates , without detrimental effects on spatial balance, sites that the researcher wishes to include in the survey for reasons other than the current randomisation (legacy sites).
1006 Design of Experiments (DoE) & Analysis of Experimental Data minimalRSD Minimally Changed CCD and BBD Generate central composite designs (CCD)with full as well as fractional factorial points (half replicate) and Box Behnken designs (BBD) with minimally changed run sequence.
1007 Design of Experiments (DoE) & Analysis of Experimental Data minimaxdesign Minimax and Minimax Projection Designs Provides two main functions, minimax() and miniMaxPro(), for computing minimax and minimax projection designs using the minimax clustering algorithm in Mak and Joseph (2018) <doi:10.1080/10618600.2017.1302881>. Current design region options include the unit hypercube (“hypercube”), the unit simplex (“simplex”), the unit ball (“ball”), as well as user-defined constraints on the unit hypercube (“custom”). Minimax designs can also be computed on user-provided images using the function minimax.map(). Design quality can be assessed using the function mMdist(), which computes the minimax (fill) distance of a design.
1008 Design of Experiments (DoE) & Analysis of Experimental Data mixexp Design and Analysis of Mixture Experiments Functions for creating designs for mixture experiments, making ternary contour plots, and making mixture effect plots.
1009 Design of Experiments (DoE) & Analysis of Experimental Data mkssd Efficient multi-level k-circulant supersaturated designs mkssd is a package that generates efficient balanced non-aliased multi-level k-circulant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has chisquare efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient multi-level k-circulant design through a progress bar. The progress of 100% means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs.
1010 Design of Experiments (DoE) & Analysis of Experimental Data mxkssd Efficient mixed-level k-circulant supersaturated designs mxkssd is a package that generates efficient balanced mixed-level k-circulant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has EfNOD efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient mixed-level k-circulant design through a progress bar. The progress of 100 per cent means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs.
1011 Design of Experiments (DoE) & Analysis of Experimental Data OBsMD Objective Bayesian Model Discrimination in Follow-Up Designs Implements the objective Bayesian methodology proposed in Consonni and Deldossi in order to choose the optimal experiment that better discriminate between competing models. G.Consonni, L. Deldossi (2014) Objective Bayesian Model Discrimination in Follow-up Experimental Designs, Test. <doi:10.1007/s11749-015-0461-3>.
1012 Design of Experiments (DoE) & Analysis of Experimental Data odr Optimal Design and Statistical Power of Multilevel Randomized Trials Calculate the optimal sample allocation that minimizes the variance of treatment effect in multilevel randomized trials under fixed budget and cost structure, perform power analyses with and without accommodating costs and budget. The references for proposed methods are: (1) Shen, Z. (in progress). Using optimal sample allocation to improve statistical precision and design efficiency for multilevel randomized trials. (unpublished doctoral dissertation). University of Cincinnati, Cincinnati, OH. (2) Shen, Z., & Kelcey, B. (revise & resubmit). Optimal sample allocation accounts for the full variation of sampling costs in cluster-randomized trials. Journal of Educational and Behavioral Statistics. (3) Shen, Z., & Kelcey, B. (2018, April). Optimal design of cluster randomized trials under condition- and unit-specific cost structures. Roundtable discussion presented at American Educational Research Association (AERA) annual conference. (4) Champely., S. (2018). pwr: Basic functions for power analysis (Version 1.2-2) [Software]. Available from <https://CRAN.R-project.org/package=pwr>.
1013 Design of Experiments (DoE) & Analysis of Experimental Data OPDOE Optimal Design of Experiments Several function related to Experimental Design are implemented here, see “Optimal Experimental Design with R” by Rasch D. et. al (ISBN 9781439816974).
1014 Design of Experiments (DoE) & Analysis of Experimental Data optbdmaeAT Optimal Block Designs for Two-Colour cDNA Microarray Experiments Computes A-, MV-, D- and E-optimal or near-optimal block designs for two-colour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The algorithms used in this package are based on the treatment exchange and array exchange algorithms of Debusho, Gemechu and Haines (2016, unpublished). The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.
1015 Design of Experiments (DoE) & Analysis of Experimental Data optDesignSlopeInt Optimal Designs for Estimating the Slope Divided by the Intercept Compute optimal experimental designs that measure the slope divided by the intercept.
1016 Design of Experiments (DoE) & Analysis of Experimental Data OptGS Near-Optimal and Balanced Group-Sequential Designs for Clinical Trials with Continuous Outcomes Functions to find near-optimal multi-stage designs for continuous outcomes.
1017 Design of Experiments (DoE) & Analysis of Experimental Data OptimalDesign Algorithms for D-, A-, and IV-Optimal Designs Algorithms for D-, A- and IV-optimal designs of experiments. Some of the functions in this package require the ‘gurobi’ software and its accompanying R package. For their installation, please follow the instructions at <www.gurobi.com> and the file gurobi_inst.txt, respectively.
1018 Design of Experiments (DoE) & Analysis of Experimental Data OptimaRegion Confidence Regions for Optima Computes confidence regions on the location of response surface optima.
1019 Design of Experiments (DoE) & Analysis of Experimental Data OptInterim Optimal Two and Three Stage Designs for Single-Arm and Two-Arm Randomized Controlled Trials with a Long-Term Binary Endpoint Optimal two and three stage designs monitoring time-to-event endpoints at a specified timepoint
1020 Design of Experiments (DoE) & Analysis of Experimental Data optrcdmaeAT Optimal Row-Column Designs for Two-Colour cDNA Microarray Experiments Computes A-, MV-, D- and E-optimal or near-optimal row-column designs for two-colour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all pairwise treatment contrasts. The algorithms used in this package are based on the array exchange and treatment exchange algorithms adopted from Debusho, Gemechu and Haines (2016, unpublished) algorithms after adjusting for the row-column designs setup. The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.
1021 Design of Experiments (DoE) & Analysis of Experimental Data osDesign Design and analysis of observational studies The osDesign serves for planning an observational study. Currently, functionality is focused on the two-phase and case-control designs. Functions in this packages provides Monte Carlo based evaluation of operating characteristics such as powers for estimators of the components of a logistic regression model.
1022 Design of Experiments (DoE) & Analysis of Experimental Data PBIBD Partially Balanced Incomplete Block Designs The PBIB designs are important type of incomplete block designs having wide area of their applications for example in agricultural experiments, in plant breeding, in sample surveys etc. This package constructs various series of PBIB designs and assists in checking all the necessary conditions of PBIB designs and the association scheme on which these designs are based on. It also assists in calculating the efficiencies of PBIB designs with any number of associate classes. The package also constructs Youden-m square designs which are Row-Column designs for the two-way elimination of heterogeneity. The incomplete columns of these Youden-m square designs constitute PBIB designs. With the present functionality, the package will be of immense importance for the researchers as it will help them to construct PBIB designs, to check if their PBIB designs and association scheme satisfy various necessary conditions for the existence, to calculate the efficiencies of PBIB designs based on any association scheme and to construct Youden-m square designs for the two-way elimination of heterogeneity. R. C. Bose and K. R. Nair (1939) <http://www.jstor.org/stable/40383923>.
1023 Design of Experiments (DoE) & Analysis of Experimental Data PGM2 Nested Resolvable Designs and their Associated Uniform Designs Construction method of nested resolvable designs from a projective geometry defined on Galois field of order 2. The obtained Resolvable designs are used to build uniform design. The presented results are based on <https://eudml.org/doc/219563> and A. Boudraa et al. (See references).
1024 Design of Experiments (DoE) & Analysis of Experimental Data ph2bayes Bayesian Single-Arm Phase II Designs An implementation of Bayesian single-arm phase II design methods for binary outcome based on posterior probability (Thall and Simon (1994) <doi:10.2307/2533377>) and predictive probability (Lee and Liu (2008) <doi:10.1177/1740774508089279>).
1025 Design of Experiments (DoE) & Analysis of Experimental Data ph2bye Phase II Clinical Trial Design Using Bayesian Methods Calculate the Bayesian posterior/predictive probability and determine the sample size and stopping boundaries for single-arm Phase II design.
1026 Design of Experiments (DoE) & Analysis of Experimental Data pid Process Improvement using Data A collection of scripts and data files for the statistics text: “Process Improvement using Data” <https://learnche.org/pid> and the online course “Experimentation for Improvement” found on Coursera. The package contains code for designed experiments, data sets and other convenience functions used in the book.
1027 Design of Experiments (DoE) & Analysis of Experimental Data pipe.design Dual-Agent Dose Escalation for Phase I Trials using the PIPE Design Implements the Product of Independent beta Probabilities dose Escalation (PIPE) design for dual-agent Phase I trials as described in Mander AP, Sweeting MJ (2015) <doi:10.1002/sim.6434>.
1028 Design of Experiments (DoE) & Analysis of Experimental Data plgp Particle Learning of Gaussian Processes Sequential Monte Carlo inference for fully Bayesian Gaussian process (GP) regression and classification models by particle learning (PL). The sequential nature of inference and the active learning (AL) hooks provided facilitate thrifty sequential design (by entropy) and optimization (by improvement) for classification and regression models, respectively. This package essentially provides a generic PL interface, and functions (arguments to the interface) which implement the GP models and AL heuristics. Functions for a special, linked, regression/classification GP model and an integrated expected conditional improvement (IECI) statistic is provides for optimization in the presence of unknown constraints. Separable and isotropic Gaussian, and single-index correlation functions are supported. See the examples section of ?plgp and demo(package=“plgp”) for an index of demos
1029 Design of Experiments (DoE) & Analysis of Experimental Data PopED Population (and Individual) Optimal Experimental Design Optimal experimental designs for both population and individual studies based on nonlinear mixed-effect models. Often this is based on a computation of the Fisher Information Matrix. This package was developed for pharmacometric problems, and examples and predefined models are available for these types of systems. The methods are described in Nyberg et al. (2012) <doi:10.1016/j.cmpb.2012.05.005>, and Foracchia et al. (2004) <doi:10.1016/S0169-2607(03)00073-7>.
1030 Design of Experiments (DoE) & Analysis of Experimental Data powerAnalysis Power Analysis in Experimental Design Basic functions for power analysis and effect size calculation.
1031 Design of Experiments (DoE) & Analysis of Experimental Data powerbydesign Power Estimates for ANOVA Designs Functions for bootstrapping the power of ANOVA designs based on estimated means and standard deviations of the conditions. Please refer to the documentation of the boot.power.anova() function for further details.
1032 Design of Experiments (DoE) & Analysis of Experimental Data powerGWASinteraction Power Calculations for GxE and GxG Interactions for GWAS Analytical power calculations for GxE and GxG interactions for case-control studies of candidate genes and genome-wide association studies (GWAS). This includes power calculation for four two-step screening and testing procedures. It can also calculate power for GxE and GxG without any screening.
1033 Design of Experiments (DoE) & Analysis of Experimental Data PwrGSD Power in a Group Sequential Design Tools for the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.
1034 Design of Experiments (DoE) & Analysis of Experimental Data qtlDesign Design of QTL experiments Tools for the design of QTL experiments
1035 Design of Experiments (DoE) & Analysis of Experimental Data qualityTools Statistical Methods for Quality Science Contains methods associated with the Define, Measure, Analyze, Improve and Control (i.e. DMAIC) cycle of the Six Sigma Quality Management methodology.It covers distribution fitting, normal and non-normal process capability indices, techniques for Measurement Systems Analysis especially gage capability indices and Gage Repeatability (i.e Gage RR) and Reproducibility studies, factorial and fractional factorial designs as well as response surface methods including the use of desirability functions. Improvement via Six Sigma is project based strategy that covers 5 phases: Define - Pareto Chart; Measure - Probability and Quantile-Quantile Plots, Process Capability Indices for various distributions and Gage RR Analyze i.e. Pareto Chart, Multi-Vari Chart, Dot Plot; Improve - Full and fractional factorial, response surface and mixture designs as well as the desirability approach for simultaneous optimization of more than one response variable. Normal, Pareto and Lenth Plot of effects as well as Interaction Plots; Control - Quality Control Charts can be found in the ‘qcc’ package. The focus is on teaching the statistical methodology used in the Quality Sciences.
1036 Design of Experiments (DoE) & Analysis of Experimental Data RcmdrPlugin.DoE R Commander Plugin for (industrial) Design of Experiments The package provides a platform-independent GUI for design of experiments. It is implemented as a plugin to the R-Commander, which is a more general graphical user interface for statistics in R based on tcl/tk. DoE functionality can be accessed through the menu Design that is added to the R-Commander menus.
1037 Design of Experiments (DoE) & Analysis of Experimental Data rodd Optimal Discriminating Designs A collection of functions for numerical construction of optimal discriminating designs. At the current moment T-optimal designs (which maximize the lower bound for the power of F-test for regression model discrimination), KL-optimal designs (for lognormal errors) and their robust analogues can be calculated with the package.
1038 Design of Experiments (DoE) & Analysis of Experimental Data RPPairwiseDesign Resolvable partially pairwise balanced design and Space-filling design via association scheme Using some association schemes to obtain a new series of resolvable partially pairwise balanced designs (RPPBD) and space-filling designs.
1039 Design of Experiments (DoE) & Analysis of Experimental Data rsm (core) Response-Surface Analysis Provides functions to generate response-surface designs, fit first- and second-order response-surface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, C-F J and Hamada, M (2009) “Experiments: Planning, Analysis, and Parameter Design Optimization” ISBN 978-0-471-69946-0.
1040 Design of Experiments (DoE) & Analysis of Experimental Data rsurface Design of Rotatable Central Composite Experiments and Response Surface Analysis Produces tables with the level of replication (number of replicates) and the experimental uncoded values of the quantitative factors to be used for rotatable Central Composite Design (CCD) experimentation and a 2-D contour plot of the corresponding variance of the predicted response according to Mead et al. (2012) <doi:10.1017/CBO9781139020879> design_ccd(), and analyzes CCD data with response surface methodology ccd_analysis(). A rotatable CCD provides values of the variance of the predicted response that are concentrically distributed around the average treatment combination used in the experimentation, which with uniform precision (implied by the use of several replicates at the average treatment combination) improves greatly the search and finding of an optimum response. These properties of a rotatable CCD represent undeniable advantages over the classical factorial design, as discussed by Panneton et al. (1999) <doi:10.13031/2013.13267> and Mead et al. (2012) <doi:10.1017/CBO9781139020879.018> among others.
1041 Design of Experiments (DoE) & Analysis of Experimental Data SensoMineR Sensory Data Analysis Statistical Methods to Analyse Sensory Data. SensoMineR: A package for sensory data analysis. S. Le and F. Husson (2008) <doi:10.1111/j.1745-459X.2007.00137.x>.
1042 Design of Experiments (DoE) & Analysis of Experimental Data seqDesign Simulation and Group Sequential Monitoring of Randomized Two-Stage Treatment Efficacy Trials with Time-to-Event Endpoints A modification of the preventive vaccine efficacy trial design of Gilbert, Grove et al. (2011, Statistical Communications in Infectious Diseases) is implemented, with application generally to individual-randomized clinical trials with multiple active treatment groups and a shared control group, and a study endpoint that is a time-to-event endpoint subject to right-censoring. The design accounts for the issues that the efficacy of the treatment/vaccine groups may take time to accrue while the multiple treatment administrations/vaccinations are given; there is interest in assessing the durability of treatment efficacy over time; and group sequential monitoring of each treatment group for potential harm, non-efficacy/efficacy futility, and high efficacy is warranted. The design divides the trial into two stages of time periods, where each treatment is first evaluated for efficacy in the first stage of follow-up, and, if and only if it shows significant treatment efficacy in stage one, it is evaluated for longer-term durability of efficacy in stage two. The package produces plots and tables describing operating characteristics of a specified design including an unconditional power for intention-to-treat and per-protocol/as-treated analyses; trial duration; probabilities of the different possible trial monitoring outcomes (e.g., stopping early for non-efficacy); unconditional power for comparing treatment efficacies; and distributions of numbers of endpoint events occurring after the treatments/vaccinations are given, useful as input parameters for the design of studies of the association of biomarkers with a clinical outcome (surrogate endpoint problem). The code can be used for a single active treatment versus control design and for a single-stage design.
1043 Design of Experiments (DoE) & Analysis of Experimental Data sFFLHD Sequential Full Factorial-Based Latin Hypercube Design Gives design points from a sequential full factorial-based Latin hypercube design, as described in Duan, Ankenman, Sanchez, and Sanchez (2015, Technometrics, <doi:10.1080/00401706.2015.1108233>).
1044 Design of Experiments (DoE) & Analysis of Experimental Data simrel Simulation of Multivariate Linear Model Data Simulate multivariate linear model data is useful in research and education weather for comparison or create data with specific properties. This package lets user to simulate linear model data of wide range of properties with few tuning parameters. The package also consist of function to create plots for the simulation objects and A shiny app as RStudio gadget. It can be a handy tool for model comparison, testing and many other purposes.
1045 Design of Experiments (DoE) & Analysis of Experimental Data skpr (core) Design of Experiments Suite: Generate and Evaluate Optimal Designs Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of split/split-split/…/N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and make analyses more reproducible.
1046 Design of Experiments (DoE) & Analysis of Experimental Data SLHD Maximin-Distance (Sliced) Latin Hypercube Designs Generate the optimal Latin Hypercube Designs (LHDs) for computer experiments with quantitative factors and the optimal Sliced Latin Hypercube Designs (SLHDs) for computer experiments with both quantitative and qualitative factors. Details of the algorithm can be found in Ba, S., Brenneman, W. A. and Myers, W. R. (2015), “Optimal Sliced Latin Hypercube Designs,” Technometrics. Important function in this package is “maximinSLHD”.
1047 Design of Experiments (DoE) & Analysis of Experimental Data soptdmaeA Sequential Optimal Designs for Two-Colour cDNA Microarray Experiments Computes sequential A-, MV-, D- and E-optimal or near-optimal block and row-column designs for two-colour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The package also provides an optional method of using the graphical user interface (GUI) R package ‘tcltk’ to ensure that it is user friendly.
1048 Design of Experiments (DoE) & Analysis of Experimental Data sp23design Design and Simulation of seamless Phase II-III Clinical Trials Provides methods for generating, exploring and executing seamless Phase II-III designs of Lai, Lavori and Shih using generalized likelihood ratio statistics. Includes pdf and source files that describe the entire R implementation with the relevant mathematical details.
1049 Design of Experiments (DoE) & Analysis of Experimental Data ssize.fdr Sample Size Calculations for Microarray Experiments This package contains a set of functions that calculates appropriate sample sizes for one-sample t-tests, two-sample t-tests, and F-tests for microarray experiments based on desired power while controlling for false discovery rates. For all tests, the standard deviations (variances) among genes can be assumed fixed or random. This is also true for effect sizes among genes in one-sample and two sample experiments. Functions also output a chart of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes.
1050 Design of Experiments (DoE) & Analysis of Experimental Data ssizeRNA Sample Size Calculation for RNA-Seq Experimental Design We propose a procedure for sample size calculation while controlling false discovery rate for RNA-seq experimental design. Our procedure depends on the Voom method proposed for RNA-seq data analysis by Law et al. (2014) <doi:10.1186/gb-2014-15-2-r29> and the sample size calculation method proposed for microarray experiments by Liu and Hwang (2007) <doi:10.1093/bioinformatics/btl664>. We develop a set of functions that calculates appropriate sample sizes for two-sample t-test for RNA-seq experiments with fixed or varied set of parameters. The outputs also contain a plot of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes. To install this package, please use ‘source(“http://bioconductor.org/biocLite.R”); biocLite(“ssizeRNA”)’. For R version 3.5 or greater, please use ‘if(!requireNamespace(“BiocManager”, quietly = TRUE)){install.packages(“BiocManager”)}; BiocManager::install(“ssizeRNA”)’.
1051 Design of Experiments (DoE) & Analysis of Experimental Data support.CEs Basic Functions for Supporting an Implementation of Choice Experiments Provides seven basic functions that support an implementation of choice experiments.
1052 Design of Experiments (DoE) & Analysis of Experimental Data TEQR Target Equivalence Range Design The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is de-escalated only if an unacceptable level of toxicity is experienced.
1053 Design of Experiments (DoE) & Analysis of Experimental Data tgp Bayesian Treed Gaussian Process Models Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions.
1054 Design of Experiments (DoE) & Analysis of Experimental Data ThreeArmedTrials Design and Analysis of Clinical Non-Inferiority or Superiority Trials with Active and Placebo Control Design and analyze three-arm non-inferiority or superiority trials which follow a gold-standard design, i.e. trials with an experimental treatment, an active, and a placebo control. Method for the following distributions are implemented: Poisson (Mielke and Munk (2009) <arXiv:0912.4169>), negative binomial (Muetze et al. (2016) <doi:10.1002/sim.6738>), normal (Pigeot et al. (2003) <doi:10.1002/sim.1450>; Hasler et al. (2009) <doi:10.1002/sim.3052>), binary (Friede and Kieser (2007) <doi:10.1002/sim.2543>), nonparametric (Muetze et al. (2017) <doi:10.1002/sim.7176>), exponential (Mielke and Munk (2009) <arXiv:0912.4169>).
1055 Design of Experiments (DoE) & Analysis of Experimental Data toxtestD Experimental design for binary toxicity tests Calculates sample size and dose allocation for binary toxicity tests, using the Fish Embryo Toxicity Test as example. An optimal test design is obtained by running (i) spoD (calculate the number of individuals to test under control conditions), (ii) setD (estimate the minimal sample size per treatment given the users precision requirements) and (iii) doseD (construct an individual dose scheme).
1056 Design of Experiments (DoE) & Analysis of Experimental Data unrepx Analysis and Graphics for Unreplicated Experiments Provides half-normal plots, reference plots, and Pareto plots of effects from an unreplicated experiment, along with various pseudo-standard-error measures, simulated reference distributions, and other tools. Many of these methods are described in Daniel C. (1959) <doi:10.1080/00401706.1959.10489866> and/or Lenth R.V. (1989) <doi:10.1080/00401706.1989.10488595>, but some new approaches are added and integrated in one package.
1057 Design of Experiments (DoE) & Analysis of Experimental Data vdg Variance Dispersion Graphs and Fraction of Design Space Plots Facilities for constructing variance dispersion graphs, fraction- of-design-space plots and similar graphics for exploring the properties of experimental designs. The design region is explored via random sampling, which allows for more flexibility than traditional variance dispersion graphs. A formula interface is leveraged to provide access to complex model formulae. Graphics can be constructed simultaneously for multiple experimental designs and/or multiple model formulae. Instead of using pointwise optimization to find the minimum and maximum scaled prediction variance curves, which can be inaccurate and time consuming, this package uses quantile regression as an alternative.
1058 Design of Experiments (DoE) & Analysis of Experimental Data Vdgraph Variance dispersion graphs and Fraction of design space plots for response surface designs Uses a modification of the published FORTRAN code in “A Computer Program for Generating Variance Dispersion Graphs” by G. Vining, Journal of Quality Technology, Vol. 25 No. 1 January 1993, to produce variance dispersion graphs. Also produces fraction of design space plots, and contains data frames for several minimal run response surface designs.
1059 Design of Experiments (DoE) & Analysis of Experimental Data VdgRsm Plots of Scaled Prediction Variances for Response Surface Designs Functions for creating variance dispersion graphs, fraction of design space plots, and contour plots of scaled prediction variances for second-order response surface designs in spherical and cuboidal regions. Also, some standard response surface designs can be generated.
1060 Design of Experiments (DoE) & Analysis of Experimental Data VNM Finding Multiple-Objective Optimal Designs for the 4-Parameter Logistic Model Provide tools for finding multiple-objective optimal designs for estimating the shape of dose-response, the ED50 (the dose producing an effect midway between the expected responses at the extreme doses) and the MED (the minimum effective dose level) for the 2,3,4-parameter logistic models and for evaluating its efficiencies for the three objectives. The acronym VNM stands for V-algorithm using Newton Raphson method to search multiple-objective optimal design.
1061 Extreme Value Analysis copula Multivariate Dependence with Copulas Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
1062 Extreme Value Analysis evd (core) Functions for Extreme Value Distributions Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.
1063 Extreme Value Analysis evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
1064 Extreme Value Analysis evir (core) Extreme Values in R Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.
1065 Extreme Value Analysis evmix Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation The usual distribution functions, maximum likelihood inference and model diagnostics for univariate stationary extreme value mixture models are provided. Kernel density estimation including various boundary corrected kernel density estimation methods and a wide choice of kernels, with cross-validation likelihood based bandwidth estimator. Reasonable consistency with the base functions in the ‘evd’ package is provided, so that users can safely interchange most code.
1066 Extreme Value Analysis extremefit Estimation of Extreme Conditional Quantiles and Probabilities Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.
1067 Extreme Value Analysis extRemes Extreme Value Analysis Functions for performing extreme value analysis.
1068 Extreme Value Analysis extremeStat Extreme Value Statistics and Quantile Estimation Code to fit, plot and compare several (extreme value) distribution functions. Can also compute (truncated) distribution quantile estimates and draw a plot with return periods on a linear scale.
1069 Extreme Value Analysis fExtremes Rmetrics - Modelling Extreme Events in Finance Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data pre-processing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.
1070 Extreme Value Analysis in2extRemes Into the extRemes Package Graphical User Interface (GUI) to some of the functions in the package extRemes version >= 2.0 are included.
1071 Extreme Value Analysis ismev An Introduction to Statistical Modeling of Extreme Values Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.
1072 Extreme Value Analysis lmom L-Moments Functions related to L-moments: computation of L-moments and trimmed L-moments of distributions and data samples; parameter estimation; L-moment ratio diagram; plot vs. quantiles of an extreme-value distribution.
1073 Extreme Value Analysis lmomco L-Moments, Censored L-Moments, Trimmed L-Moments, L-Comoments, and Many Distributions Extensive functions for L-moments (LMs) and probability-weighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and L-moment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for right-tail and left-tail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TL-moments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances- covariances of LMs are provided. The Harri-Coble Tau34-squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for right-tail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], Eta-Mu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], Kappa-Mu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3-p log-Normal [L], Pearson Type III [L], Rayleigh [L], Rev-Gumbel [L+RC], Rice/Rician [L], Slash [TL], 3-p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample L-comoments (LCMs) are implemented to measure asymmetric associations.
1074 Extreme Value Analysis lmomRFA Regional Frequency Analysis using L-Moments Functions for regional frequency analysis using the methods of J. R. M. Hosking and J. R. Wallis (1997), “Regional frequency analysis: an approach based on L-moments”.
1075 Extreme Value Analysis mev Multivariate Extreme Value Distributions Exact simulation from max-stable processes, R-Pareto processes for various parametric models. Threshold selection methods. Multivariate extreme diagnostics. Estimation and likelihoods for univariate extremes.
1076 Extreme Value Analysis POT Generalized Pareto Distribution and Peaks Over Threshold Some functions useful to perform a Peak Over Threshold analysis in univariate and bivariate cases, see Beirlant et al. (2004) <doi:10.1002/0470012382>. A user’s guide is available.
1077 Extreme Value Analysis ptsuite Tail Index Estimation for Power Law Distributions Various estimation methods for the shape parameter of Pareto distributed data. This package contains functions for various estimation methods such as maximum likelihood (Newman, 2005)<doi:10.1016/j.cities.2012.03.001>, Hill’s estimator (Hill, 1975)<doi:10.1214/aos/1176343247>, least squares (Zaher et al., 2014)<doi:10.9734/BJMCS/2014/10890>, method of moments (Rytgaard, 1990)<doi:10.2143/AST.20.2.2005443>, percentiles (Bhatti et al., 2018)<doi:10.1371/journal.pone.0196456>, and weighted least squares (Nair et al., 2019) to estimate the shape parameter of Pareto distributed data. It also provides both a heuristic method (Hubert et al., 2013)<doi:10.1016/j.csda.2012.07.011> and a goodness of fit test (Gulati and Shapiro, 2008)<doi:10.1007/978-0-8176-4619-6> for testing for Pareto data as well as a method for generating Pareto distributed data.
1078 Extreme Value Analysis QRM Provides R-Language Code to Examine Quantitative Risk Management Concepts Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.
1079 Extreme Value Analysis ReIns Functions from “Reinsurance: Actuarial and Statistical Aspects” Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels <http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470772689.html>.
1080 Extreme Value Analysis Renext Renewal Method for Extreme Values Extrapolation Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a Maximum-Likelihood framework.
1081 Extreme Value Analysis revdbayes Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.r-project.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the K-gaps model of Suveges and Davison (2010) <doi:10.1214/09-AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.
1082 Extreme Value Analysis RTDE Robust Tail Dependence Estimation Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and bias-corrected estimation of the coefficient of tail dependence’ and ‘Robust and bias-corrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS-6416).
1083 Extreme Value Analysis SpatialExtremes Modelling Spatial Extremes Tools for the statistical modelling of spatial extremes using max-stable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric max-stable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple max-stable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the so-called conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) <doi:10.1214/11-STS376>, Padoan et al. (2010) <doi:10.1198/jasa.2009.tm08577>, Dombry et al. (2013) <doi:10.1093/biomet/ass067>.
1084 Extreme Value Analysis texmex Statistical Modelling of Extreme Values Statistical extreme value modelling of threshold excesses, maxima and multivariate extremes. Univariate models for threshold excesses and maxima are the Generalised Pareto, and Generalised Extreme Value model respectively. These models may be fitted by using maximum (optionally penalised-)likelihood, or Bayesian estimation, and both classes of models may be fitted with covariates in any/all model parameters. Model diagnostics support the fitting process. Graphical output for visualising fitted models and return level estimates is provided. For serially dependent sequences, the intervals declustering algorithm of Ferro and Segers (2003) <doi:10.1111/1467-9868.00401> is provided, with diagnostic support to aid selection of threshold and declustering horizon. Multivariate modelling is performed via the conditional approach of Heffernan and Tawn (2004) <doi:10.1111/j.1467-9868.2004.02050.x>, with graphical tools for threshold selection and to diagnose estimation convergence.
1085 Extreme Value Analysis threshr Threshold Selection and Uncertainty for Extreme Value Analysis Provides functions for the selection of thresholds for use in extreme value models, based mainly on the methodology in Northrop, Attalides and Jonathan (2017) <doi:10.1111/rssc.12159>. It also performs predictive inferences about future extreme values, based either on a single threshold or on a weighted average of inferences from multiple thresholds, using the ‘revdbayes’ package <https://cran.r-project.org/package=revdbayes>. At the moment only the case where the data can be treated as independent identically distributed observations is considered.
1086 Extreme Value Analysis VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
1087 Empirical Finance actuar Actuarial Functions and Heavy Tailed Distributions Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poisson-inverse Gaussian discrete distribution; zero-truncated and zero-modified extensions of the standard discrete distributions. Support for phase-type distributions commonly used to compute ruin probabilities.
1088 Empirical Finance AmericanCallOpt This package includes pricing function for selected American call options with underlying assets that generate payouts This package includes a set of pricing functions for American call options. The following cases are covered: Pricing of an American call using the standard binomial approximation; Hedge parameters for an American call with a standard binomial tree; Binomial pricing of an American call with continuous payout from the underlying asset; Binomial pricing of an American call with an underlying stock that pays proportional dividends in discrete time; Pricing of an American call on futures using a binomial approximation; Pricing of a currency futures American call using a binomial approximation; Pricing of a perpetual American call. The user should kindly notice that this material is for educational purposes only. The codes are not optimized for computational efficiency as they are meant to represent standard cases of analytical and numerical solution.
1089 Empirical Finance backtest Exploring Portfolio-Based Conjectures About Financial Instruments The backtest package provides facilities for exploring portfolio-based conjectures about financial instruments (stocks, bonds, swaps, options, et cetera).
1090 Empirical Finance bayesGARCH Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) <doi:10.1007/978-3-540-78657-3>.
1091 Empirical Finance BCC1997 Calculation of Option Prices Based on a Universal Solution Calculates the prices of European options based on the universal solution provided by Bakshi, Cao and Chen (1997) <doi:10.1111/j.1540-6261.1997.tb02749.x>. This solution considers stochastic volatility, stochastic interest and random jumps. Please cite their work if this package is used.
1092 Empirical Finance BenfordTests Statistical Tests for Evaluating Conformity to Benford’s Law Several specialized statistical tests and support functions for determining if numerical data could conform to Benford’s law.
1093 Empirical Finance betategarch Simulation, Estimation and Forecasting of Beta-Skew-t-EGARCH Models Simulation, estimation and forecasting of first-order Beta-Skew-t-EGARCH models with leverage (one-component, two-component, skewed versions).
1094 Empirical Finance bizdays Business Days Calculations and Utilities Business days calculations based on a list of holidays and nonworking weekdays. Quite useful for fixed income and derivatives pricing.
1095 Empirical Finance BLModel Black-Litterman Posterior Distribution Posterior distribution in the Black-Litterman model is computed from a prior distribution given in the form of a time series of asset returns and a continuous distribution of views provided by the user as an external function.
1096 Empirical Finance BurStFin Burns Statistics Financial A suite of functions for finance, including the estimation of variance matrices via a statistical factor model or Ledoit-Wolf shrinkage.
1097 Empirical Finance BurStMisc Burns Statistics Miscellaneous Script search, corner, genetic optimization, permutation tests, write expect test.
1098 Empirical Finance CADFtest A Package to Perform Covariate Augmented Dickey-Fuller Unit Root Tests Hansen’s (1995) Covariate-Augmented Dickey-Fuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The p-values of the test are computed using the procedure illustrated in Lupi (2009).
1099 Empirical Finance car Companion to Applied Regression Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.
1100 Empirical Finance ChainLadder Statistical Methods and Models for Claims Reserving in General Insurance Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.
1101 Empirical Finance copula Multivariate Dependence with Copulas Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
1102 Empirical Finance CreditMetrics Functions for calculating the CreditMetrics risk model A set of functions for computing the CreditMetrics risk model
1103 Empirical Finance credule Credit Default Swap Functions It provides functions to bootstrap Credit Curves from market quotes (Credit Default Swap - CDS - spreads) and price Credit Default Swaps - CDS.
1104 Empirical Finance crp.CSFP CreditRisk+ Portfolio Model Modelling credit risks based on the concept of “CreditRisk+”, First Boston Financial Products, 1997 and “CreditRisk+ in the Banking Industry”, Gundlach & Lehrbass, Springer, 2003.
1105 Empirical Finance crseEventStudy A Robust and Powerful Test of Abnormal Stock Returns in Long-Horizon Event Studies Based on Dutta et al. (2018) <doi:10.1016/j.jempfin.2018.02.004>, this package provides their standardized test for abnormal returns in long-horizon event studies. The methods used improve the major weaknesses of size, power, and robustness of long-run statistical tests described in Kothari/Warner (2007) <doi:10.1016/B978-0-444-53265-7.50015-9>. Abnormal returns are weighted by their statistical precision (i.e., standard deviation), resulting in abnormal standardized returns. This procedure efficiently captures the heteroskedasticity problem. Clustering techniques following Cameron et al. (2011) <10.1198/jbes.2010.07136> are adopted for computing cross-sectional correlation robust standard errors. The statistical tests in this package therefore accounts for potential biases arising from returns’ cross-sectional correlation, autocorrelation, and volatility clustering without power loss.
1106 Empirical Finance cvar Compute Expected Shortfall and Value at Risk for Continuous Distributions Compute expected shortfall (ES) and Value at Risk (VaR) from a quantile function, distribution function, random number generator or probability density function. ES is also known as Conditional Value at Risk (CVaR). Virtually any continuous distribution can be specified. The functions are vectorized over the arguments. The computations are done directly from the definitions, see e.g. Acerbi and Tasche (2002) <doi:10.1111/1468-0300.00091>. Some support for GARCH models is provided, as well.
1107 Empirical Finance data.table Extension of ‘data.frame’ Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
1108 Empirical Finance derivmkts Functions and R Code to Accompany Derivatives Markets A set of pricing and expository functions that should be useful in teaching a course on financial derivatives.
1109 Empirical Finance dlm Bayesian and Likelihood Analysis of Dynamic Linear Models Provides routines for Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models.
1110 Empirical Finance Dowd Functions Ported from ‘MMR2’ Toolbox Offered in Kevin Dowd’s Book Measuring Market Risk ‘Kevin Dowd’s’ book Measuring Market Risk is a widely read book in the area of risk measurement by students and practitioners alike. As he claims, ‘MATLAB’ indeed might have been the most suitable language when he originally wrote the functions, but, with growing popularity of R it is not entirely valid. As ‘Dowd’s’ code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them. ‘Dowd’s’ original code can be downloaded from www.kevindowd.org/measuring-market-risk/. It should be noted that ‘Dowd’ offers both ‘MMR2’ and ‘MMR1’ toolboxes. Only ‘MMR2’ was ported to R. ‘MMR2’ is more recent version of ‘MMR1’ toolbox and they both have mostly similar function. The toolbox mainly contains different parametric and non parametric methods for measurement of market risk as well as backtesting risk measurement methods.
1111 Empirical Finance DriftBurstHypothesis Calculates the Test-Statistic for the Drift Burst Hypothesis Calculates the T-Statistic for the drift burst hypothesis from the working paper Christensen, Oomen and Reno (2018) <doi:10.2139/ssrn.2842535>. The authors’ MATLAB code is available upon request, see: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2842535>.
1112 Empirical Finance dse Dynamic Systems Estimation (Time Series Package) Tools for multivariate, linear, time-invariant, time series models. This includes ARMA and state-space representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and state-space model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.
1113 Empirical Finance DtD Distance to Default Provides fast methods to work with Merton’s distance to default model introduced in Merton (1974) <doi:10.1111/j.1540-6261.1974.tb03058.x>. The methods includes simulation and estimation of the parameters.
1114 Empirical Finance dyn Time Series Regression Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.
1115 Empirical Finance dynlm Dynamic Linear Regression Dynamic linear models and time series regression.
1116 Empirical Finance ESG ESG - A package for asset projection The package presents a “Scenarios” class containing general parameters, risk parameters and projection results. Risk parameters are gathered together into a ParamsScenarios sub-object. The general process for using this package is to set all needed parameters in a Scenarios object, use the customPathsGeneration method to proceed to the projection, then use xxx_PriceDistribution() methods to get asset prices.
1117 Empirical Finance estudy2 An Implementation of Parametric and Nonparametric Event Study An implementation of a most commonly used event study methodology, including both parametric and nonparametric tests. It contains variety aspects of the rate of return estimation (the core calculation is done in C++), as well as three classical for event study market models: mean adjusted returns, market adjusted returns and single-index market models. There are 6 parametric and 6 nonparametric tests provided, which examine cross-sectional daily abnormal return (see the documentation of the functions for more information). Parametric tests include tests proposed by Brown and Warner (1980) <doi:10.1016/0304-405X(80)90002-1>, Brown and Warner (1985) <doi:10.1016/0304-405X(85)90042-X>, Boehmer et al. (1991) <doi:10.1016/0304-405X(91)90032-F>, Patell (1976) <doi:10.2307/2490543>, and Lamb (1995) <doi:10.2307/253695>. Nonparametric tests covered in estudy2 are tests described in Corrado and Zivney (1992) <doi:10.2307/2331331>, McConnell and Muscarella (1985) <doi:10.1016/0304-405X(85)90006-6>, Boehmer et al. (1991) <doi:10.1016/0304-405X(91)90032-F>, Cowan (1992) <doi:10.1007/BF00939016>, Corrado (1989) <doi:10.1016/0304-405X(89)90064-0>, Campbell and Wasley (1993) <doi:10.1016/0304-405X(93)90025-7>, Savickas (2003) <doi:10.1111/1475-6803.00052>, Kolari and Pynnonen (2010) <doi:10.1093/rfs/hhq072>. Furthermore, tests for the cumulative abnormal returns proposed by Brown and Warner (1985) <doi:10.1016/0304-405X(85)90042-X> and Lamb (1995) <doi:10.2307/253695> are included.
1118 Empirical Finance factorstochvol Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models with interweaving <doi:10.1080/10618600.2017.1322091>. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix <doi:10.1016/j.jeconom.2018.11.007>.
1119 Empirical Finance fame Interface for FAME Time Series Database Read and write FAME databases.
1120 Empirical Finance fAssets (core) Rmetrics - Analysing and Modelling Financial Assets Provides a collection of functions to manage, to investigate and to analyze data sets of financial assets from different points of view.
1121 Empirical Finance FatTailsR Kiener Distributions and Fat Tails in Finance Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, value-at-risk and expected shortfall. Include power hyperbolas and power hyperbolic functions.
1122 Empirical Finance fBasics (core) Rmetrics - Markets and Basic Statistics Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.
1123 Empirical Finance fBonds (core) Rmetrics - Pricing and Evaluating Bonds It implements the Nelson-Siegel and the Nelson-Siegel-Svensson term structures.
1124 Empirical Finance fCopulae (core) Rmetrics - Bivariate Dependence Structures with Copulae Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.
1125 Empirical Finance fExoticOptions (core) Rmetrics - Pricing and Evaluating Exotic Option Provides a collection of functions to evaluate barrier options, Asian options, binary options, currency translated options, lookback options, multiple asset options and multiple exercise options.
1126 Empirical Finance fExtremes (core) Rmetrics - Modelling Extreme Events in Finance Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data pre-processing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.
1127 Empirical Finance fgac Generalized Archimedean Copula Bi-variate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).
1128 Empirical Finance fGarch (core) Rmetrics - Autoregressive Conditional Heteroskedastic Modelling Provides a collection of functions to analyze and model heteroskedastic behavior in financial time series models.
1129 Empirical Finance fImport (core) Rmetrics - Importing Economic and Financial Data Provides a collection of utility functions to download and manage data sets from the Internet or from other sources.
1130 Empirical Finance FinancialMath Financial Mathematics for Actuaries Contains financial math functions and introductory derivative functions included in the Society of Actuaries and Casualty Actuarial Society ‘Financial Mathematics’ exam, and some topics in the ‘Models for Financial Economics’ exam.
1131 Empirical Finance FinAsym Classifies implicit trading activity from market quotes and computes the probability of informed trading This package accomplishes two tasks: a) it classifies implicit trading activity from quotes in OTC markets using the algorithm of Lee and Ready (1991); b) based on information for trade initiation, the package computes the probability of informed trading of Easley and O’Hara (1987).
1132 Empirical Finance finreportr Financial Data from U.S. Securities and Exchange Commission Download and display company financial data from the U.S. Securities and Exchange Commission’s EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <https://www.sec.gov/edgar/searchedgar/companysearch.html> for more information.
1133 Empirical Finance fmdates Financial Market Date Calculations Implements common date calculations relevant for specifying the economic nature of financial market contracts that are typically defined by International Swap Dealer Association (ISDA, <http://www2.isda.org>) legal documentation. This includes methods to check whether dates are business days in certain locales, functions to adjust and shift dates and time length (or day counter) calculations.
1134 Empirical Finance fMultivar (core) Rmetrics - Analysing and Modeling Multivariate Financial Return Distributions Provides a collection of functions to manage, to investigate and to analyze bivariate and multivariate data sets of financial returns.
1135 Empirical Finance fNonlinear (core) Rmetrics - Nonlinear and Chaotic Time Series Modelling Provides a collection of functions for testing various aspects of univariate time series including independence and neglected nonlinearities. Further provides functions to investigate the chaotic behavior of time series processes and to simulate different types of chaotic time series maps.
1136 Empirical Finance fOptions (core) Rmetrics - Pricing and Evaluating Basic Options Provides a collection of functions to valuate basic options. This includes the generalized Black-Scholes option, options on futures and options on commodity futures.
1137 Empirical Finance forecast Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
1138 Empirical Finance fPortfolio (core) Rmetrics - Portfolio Selection and Optimization Provides a collection of functions to optimize portfolios and to analyze them from different points of view.
1139 Empirical Finance fracdiff Fractionally differenced ARIMA aka ARFIMA(p,d,q) models Maximum likelihood estimation of the parameters of a fractionally differenced ARIMA(p,d,q) model (Haslett and Raftery, Appl.Statistics, 1989).
1140 Empirical Finance fractal A Fractal Time Series Modeling and Analysis Package Stochastic fractal and deterministic chaotic time series analysis.
1141 Empirical Finance FRAPO Financial Risk Modelling and Portfolio Optimisation with R Accompanying package of the book ‘Financial Risk Modelling and Portfolio Optimisation with R’, second edition. The data sets used in the book are contained in this package.
1142 Empirical Finance fRegression (core) Rmetrics - Regression Based Decision and Prediction A collection of functions for linear and non-linear regression modelling. It implements a wrapper for several regression models available in the base and contributed packages of R.
1143 Empirical Finance frmqa The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.
1144 Empirical Finance fTrading (core) Rmetrics - Trading and Rebalancing Financial Instruments A collection of functions for trading and rebalancing financial instruments. It implements various technical indicators to analyse time series such as moving averages or stochastic oscillators.
1145 Empirical Finance GCPM Generalized Credit Portfolio Model Analyze the default risk of credit portfolios. Commonly known models, like CreditRisk+ or the CreditMetrics model are implemented in their very basic settings. The portfolio loss distribution can be achieved either by simulation or analytically in case of the classic CreditRisk+ model. Models are only implemented to respect losses caused by defaults, i.e. migration risk is not included. The package structure is kept flexible especially with respect to distributional assumptions in order to quantify the sensitivity of risk figures with respect to several assumptions. Therefore the package can be used to determine the credit risk of a given portfolio as well as to quantify model sensitivities.
1146 Empirical Finance GetHFData Download and Aggregate High Frequency Trading Data from Bovespa Downloads and aggregates high frequency trading data for Brazilian instruments directly from Bovespa ftp site <ftp://ftp.bmf.com.br/MarketData/>.
1147 Empirical Finance gets General-to-Specific (GETS) Modelling and Indicator Saturation Methods Automated General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.
1148 Empirical Finance GetTDData Get Data for Brazilian Bonds (Tesouro Direto) Downloads and aggregates data for Brazilian government issued bonds directly from the website of Tesouro Direto <http://www.tesouro.fazenda.gov.br/tesouro-direto-balanco-e-estatisticas>.
1149 Empirical Finance GEVStableGarch ARMA-GARCH/APARCH Models with GEV and Stable Distributions Package for simulation and estimation of ARMA-GARCH/APARCH models with GEV and stable distributions.
1150 Empirical Finance ghyp A Package on Generalized Hyperbolic Distribution and Its Special Cases Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Student-t and Gaussian distribution). Especially, it contains fitting procedures, an AIC-based model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.
1151 Empirical Finance gmm Generalized Method of Moments and Generalized Empirical Likelihood It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; <doi:10.2307/1392442>) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>).
1152 Empirical Finance gogarch Generalized Orthogonal GARCH (GO-GARCH) models Implementation of the GO-GARCH model class
1153 Empirical Finance GUIDE GUI for DErivatives in R A nice GUI for financial DErivatives in R.
1154 Empirical Finance highfrequency Tools for Highfrequency Data Analysis Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity.
1155 Empirical Finance IBrokers R API to Interactive Brokers Trader Workstation Provides native R access to Interactive Brokers Trader Workstation API.
1156 Empirical Finance InfoTrad Calculates the Probability of Informed Trading (PIN) Estimates the probability of informed trading (PIN) initially introduced by Easley et. al. (1996) <doi:10.1111/j.1540-6261.1996.tb04074.x> . Contribution of the package is that it uses likelihood factorizations of Easley et. al. (2010) <doi:10.1017/S0022109010000074> (EHO factorization) and Lin and Ke (2011) <doi:10.1016/j.finmar.2011.03.001> (LK factorization). Moreover, the package uses different estimation algorithms. Specifically, the grid-search algorithm proposed by Yan and Zhang (2012) <doi:10.1016/j.jbankfin.2011.08.003> , hierarchical agglomerative clustering approach proposed by Gan et. al. (2015) <doi:10.1080/14697688.2015.1023336> and later extended by Ersan and Alici (2016) <doi:10.1016/j.intfin.2016.04.001> .
1157 Empirical Finance lgarch Simulation and Estimation of Log-GARCH Models Simulation and estimation of univariate and multivariate log-GARCH models. The main functions of the package are: lgarchSim(), mlgarchSim(), lgarch() and mlgarch(). The first two functions simulate from a univariate and a multivariate log-GARCH model, respectively, whereas the latter two estimate a univariate and multivariate log-GARCH model, respectively.
1158 Empirical Finance lifecontingencies Financial and Actuarial Mathematics for Life Contingencies Classes and methods that allow the user to manage life table, actuarial tables (also multiple decrements tables). Moreover, functions to easily perform demographic, financial and actuarial mathematics on life contingencies insurances calculations are contained therein.
1159 Empirical Finance lmtest Testing Linear Regression Models A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.
1160 Empirical Finance longmemo Statistics for Long-Memory Processes (Book Jan Beran), and Related Functionality Datasets and Functionality from ‘Jan Beran’ (1994). Statistics for Long-Memory Processes; Chapman & Hall. Estimation of Hurst (and more) parameters for fractional Gaussian noise, ‘fARIMA’ and ‘FEXP’ models.
1161 Empirical Finance LSMonteCarlo American options pricing with Least Squares Monte Carlo method The package compiles functions for calculating prices of American put options with Least Squares Monte Carlo method. The option types are plain vanilla American put, Asian American put, and Quanto American put. The pricing algorithms include variance reduction techniques such as Antithetic Variates and Control Variates. Additional functions are given to derive “price surfaces” at different volatilities and strikes, create 3-D plots, quickly generate Geometric Brownian motion, and calculate prices of European options with Black & Scholes analytical solution.
1162 Empirical Finance markovchain Easy Handling Discrete Time Markov Chains Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided.
1163 Empirical Finance MarkowitzR Statistical Significance of the Markowitz Portfolio A collection of tools for analyzing significance of Markowitz portfolios.
1164 Empirical Finance matchingMarkets Analysis of Stable Matchings Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
1165 Empirical Finance MSGARCH Markov-Switching GARCH Models Fit (by Maximum Likelihood or MCMC/Bayesian), simulate, and forecast various Markov-Switching GARCH models as described in Ardia et al. (2017) <https://ssrn.com/abstract=2845809>.
1166 Empirical Finance mvtnorm Multivariate Normal and t Distributions Computes multivariate normal and t probabilities, quantiles, random deviates and densities.
1167 Empirical Finance NetworkRiskMeasures Risk Measures for (Financial) Networks Implements some risk measures for (financial) networks, such as DebtRank, Impact Susceptibility, Impact Diffusion and Impact Fluidity.
1168 Empirical Finance nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
1169 Empirical Finance NMOF Numerical Methods and Optimization in Finance Functions, examples and data from the book “Numerical Methods and Optimization in Finance” by M. Gilli, D. Maringer and E. Schumann (2011), ISBN 978-0123756626. The package provides implementations of several optimisation heuristics, such as Differential Evolution, Genetic Algorithms and Threshold Accepting. There are also functions for the valuation of financial instruments, such as bonds and options, and functions that help with stochastic simulations.
1170 Empirical Finance obAnalytics Limit Order Book Analytics Data processing, visualisation and analysis of Limit Order Book event data.
1171 Empirical Finance OptHedging Estimation of value and hedging strategy of call and put options Estimation of value and hedging strategy of call and put options, based on optimal hedging and Monte Carlo method, from Chapter 3 of ‘Statistical Methods for Financial Engineering’, by Bruno Remillard, CRC Press, (2013).
1172 Empirical Finance OptionPricing Option Pricing with Efficient Simulation Algorithms Efficient Monte Carlo Algorithms for the price and the sensitivities of Asian and European Options under Geometric Brownian Motion.
1173 Empirical Finance pa Performance Attribution for Equity Portfolios A package that provides tools for conducting performance attribution for equity portfolios. The package uses two methods: the Brinson method and a regression-based analysis.
1174 Empirical Finance parma Portfolio Allocation and Risk Management Applications Provision of a set of models and methods for use in the allocation and management of capital in financial portfolios.
1175 Empirical Finance pbo Probability of Backtest Overfitting Following the method of Bailey et al., computes for a collection of candidate models the probability of backtest overfitting, the performance degradation and probability of loss, and the stochastic dominance.
1176 Empirical Finance PeerPerformance Luck-Corrected Peer Performance Analysis in R Provides functions to perform the peer performance analysis of funds’ returns as described in Ardia and Boudt (2018) <doi:10.1016/j.jbankfin.2017.10.014>.
1177 Empirical Finance PerformanceAnalytics (core) Econometric Tools for Performance and Risk Analysis Collection of econometric functions for performance and risk analysis. In addition to standard risk and performance metrics, this package aims to aid practitioners and researchers in utilizing the latest research in analysis of non-normal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.
1178 Empirical Finance pinbasic Fast and Stable Estimation of the Probability of Informed Trading (PIN) Utilities for fast and stable estimation of the probability of informed trading (PIN) in the model introduced by Easley et al. (2002) <doi:10.1111/1540-6261.00493> are implemented. Since the basic model developed by Easley et al. (1996) <doi:10.1111/j.1540-6261.1996.tb04074.x> is nested in the former due to equating the intensity of uninformed buys and sells, functions can also be applied to this simpler model structure, if needed. State-of-the-art factorization of the model likelihood function as well as most recent algorithms for generating initial values for optimization routines are implemented. In total, two likelihood factorizations and three methodologies for starting values are included. Furthermore, functions for simulating datasets of daily aggregated buys and sells, calculating confidence intervals for the probability of informed trading and posterior probabilities of trading days’ conditions are available.
1179 Empirical Finance portfolio Analysing equity portfolios Classes for analysing and implementing equity portfolios.
1180 Empirical Finance PortfolioEffectHFT High Frequency Portfolio Analytics by PortfolioEffect R interface to PortfolioEffect cloud service for backtesting high frequency trading (HFT) strategies, intraday portfolio analysis and optimization. Includes auto-calibrating model pipeline for market microstructure noise, risk factors, price jumps/outliers, tail risk (high-order moments) and price fractality (long memory). Constructed portfolios could use client-side market data or access HF intraday price history for all major US Equities. See <https://www.portfolioeffect.com/> for more information on the PortfolioEffect high frequency portfolio analytics platform.
1181 Empirical Finance PortfolioOptim Small/Large Sample Portfolio Optimization Two functions for financial portfolio optimization by linear programming are provided. One function implements Benders decomposition algorithm and can be used for very large data sets. The other, applicable for moderate sample sizes, finds optimal portfolio which has the smallest distance to a given benchmark portfolio.
1182 Empirical Finance portfolioSim Framework for simulating equity portfolio strategies Classes that serve as a framework for designing equity portfolio simulations.
1183 Empirical Finance PortRisk Portfolio Risk Analysis Risk Attribution of a portfolio with Volatility Risk Analysis.
1184 Empirical Finance quantmod Quantitative Financial Modelling Framework Specify, build, trade, and analyse quantitative financial trading strategies.
1185 Empirical Finance QuantTools Enhanced Quantitative Trading Modelling Download and organize historical market data from multiple sources like Yahoo (<https://finance.yahoo.com>), Google (<https://www.google.com/finance>), Finam (<https://www.finam.ru/profile/moex-akcii/sberbank/export/>), MOEX (<https://www.moex.com/en/derivatives/contracts.aspx>) and IQFeed (<https://www.iqfeed.net/symbolguide/index.cfm?symbolguide=lookup>). Code your trading algorithms in modern C++11 with powerful event driven tick processing API including trading costs and exchange communication latency and transform detailed data seamlessly into R. In just few lines of code you will be able to visualize every step of your trading model from tick data to multi dimensional heat maps.
1186 Empirical Finance ragtop Pricing Equity Derivatives with Extensions of Black-Scholes Algorithms to price American and European equity options, convertible bonds and a variety of other financial derivatives. It uses an extension of the usual Black-Scholes model in which jump to default may occur at a probability specified by a power-law link between stock price and hazard rate as found in the paper by Takahashi, Kobayashi, and Nakagawa (2001) <doi:10.3905/jfi.2001.319302>. We use ideas and techniques from Andersen and Buffum (2002) <doi:10.2139/ssrn.355308> and Linetsky (2006) <doi:10.1111/j.1467-9965.2006.00271.x>.
1187 Empirical Finance Rbitcoin R & bitcoin integration Utilities related to Bitcoin. Unified markets API interface (bitstamp, kraken, btce, bitmarket). Both public and private API calls. Integration of data structures for all markets. Support SSL. Read Rbitcoin documentation (command: ?btc) for more information.
1188 Empirical Finance Rblpapi R Interface to ‘Bloomberg’ An R Interface to ‘Bloomberg’ is provided via the ‘Blp API’.
1189 Empirical Finance Rcmdr R Commander A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
1190 Empirical Finance RcppQuantuccia R Bindings to the ‘Quantuccia’ Header-Only Essentials of ‘QuantLib’ ‘QuantLib’ bindings are provided for R using ‘Rcpp’ and the header-only ‘Quantuccia’ variant (put together by Peter Caspers) offering an essential subset of ‘QuantLib’. See the included file ‘AUTHORS’ for a full list of contributors to both ‘QuantLib’ and ‘Quantuccia’.
1191 Empirical Finance reinsureR Reinsurance Treaties Application Application of reinsurance treaties to claims portfolios. The package creates a class Claims whose objective is to store claims and premiums, on which different treaties can be applied. A statistical analysis can then be applied to measure the impact of reinsurance, producing a table or graphical output. This package can be used for estimating the impact of reinsurance on several portfolios or for pricing treaties through statistical analysis. Documentation for the implemented methods can be found in “Reinsurance: Actuarial and Statistical Aspects” by Hansjoerg Albrecher, Jan Beirlant, Jozef L. Teugels (2017, ISBN: 978-0-470-77268-3) and “REINSURANCE: A Basic Guide to Facultative and Treaty Reinsurance” by Munich Re (2010) <https://www.munichre.com/site/mram/get/documents_E96160999/mram/assetpool.mr_america/PDFs/3_Publications/reinsurance_basic_guide.pdf>.
1192 Empirical Finance restimizeapi Functions for Working with the ‘www.estimize.com’ Web Services Provides the user with functions to develop their trading strategy, uncover actionable trading ideas, and monitor consensus shifts with crowdsourced earnings and economic estimate data directly from <www.estimize.com>. Further information regarding the web services this package invokes can be found at <www.estimize.com/api>.
1193 Empirical Finance Risk Computes 26 Financial Risk Measures for Any Continuous Distribution Computes 26 financial risk measures for any continuous distribution. The 26 financial risk measures include value at risk, expected shortfall due to Artzner et al. (1999) <doi:10.1007/s10957-011-9968-2>, tail conditional median due to Kou et al. (2013) <doi:10.1287/moor.1120.0577>, expectiles due to Newey and Powell (1987) <doi:10.2307/1911031>, beyond value at risk due to Longin (2001) <doi:10.3905/jod.2001.319161>, expected proportional shortfall due to Belzunce et al. (2012) <doi:10.1016/j.insmatheco.2012.05.003>, elementary risk measure due to Ahmadi-Javid (2012) <doi:10.1007/s10957-011-9968-2>, omega due to Shadwick and Keating (2002), sortino ratio due to Rollinger and Hoffman (2013), kappa due to Kaplan and Knowles (2004), Wang (1998)’s <doi:10.1080/10920277.1998.10595708> risk measures, Stone (1973)’s <doi:10.2307/2978638> risk measures, Luce (1980)’s <doi:10.1007/BF00135033> risk measures, Sarin (1987)’s <doi:10.1007/BF00126387> risk measures, Bronshtein and Kurelenkova (2009)’s risk measures.
1194 Empirical Finance riskParityPortfolio Design of Risk Parity Portfolios Fast design of risk parity portfolios for financial investment. The goal of the risk parity portfolio formulation is to equalize or distribute the risk contributions of the different assets, which is missing if we simply consider the overall volatility of the portfolio as in the mean-variance Markowitz portfolio. In addition to the vanilla formulation, where the risk contributions are perfectly equalized subject to no shortselling and budget constraints, many other formulations are considered that allow for box constraints and shortselling, as well as the inclusion of additional objectives like the expected return and overall variance. See vignette for a detailed documentation and comparison, with several illustrative examples. The package is based on the papers: Y. Feng, and D. P. Palomar (2015). SCRIP: Successive Convex Optimization Methods for Risk Parity Portfolio Design. IEEE Trans. on Signal Processing, vol. 63, no. 19, pp. 5285-5300. <doi:10.1109/TSP.2015.2452219>. F. Spinu (2013), An Algorithm for Computing Risk Parity Weights. <doi:10.2139/ssrn.2297383>. T. Griveau-Billion, J. Richard, and T. Roncalli (2013). A fast algorithm for computing High-dimensional risk parity portfolios. <arXiv:1311.4057>.
1195 Empirical Finance RiskPortfolios Computation of Risk-Based Portfolios Collection of functions designed to compute risk-based portfolios as described in Ardia et al. (2017) <doi:10.1007/s10479-017-2474-7> and Ardia et al. (2017) <doi:10.21105/joss.00171>.
1196 Empirical Finance riskSimul Risk Quantification for Stock Portfolios under the T-Copula Model Implements efficient simulation procedures to estimate tail loss probabilities and conditional excess for a stock portfolio. The log-returns are assumed to follow a t-copula model with generalized hyperbolic or t marginals.
1197 Empirical Finance RM2006 RiskMetrics 2006 Methodology Estimation the conditional covariance matrix using the RiskMetrics 2006 methodology of Zumbach (2007) <doi:10.2139/ssrn.1420185>.
1198 Empirical Finance rmgarch Multivariate GARCH Models Feasible multivariate GARCH models including DCC, GO-GARCH and Copula-GARCH. See Boudt, Galanos, Payseur and Zivot (2019) for a review of multivariate GARCH models <doi:10.1016/bs.host.2019.01.001>.
1199 Empirical Finance RND Risk Neutral Density Extraction Package Extract the implied risk neutral density from options using various methods.
1200 Empirical Finance rpatrec Recognising Visual Charting Patterns in Time Series Data Generating visual charting patterns and noise, smoothing to find a signal in noisy time series and enabling users to apply their findings to real life data.
1201 Empirical Finance RQuantLib R Interface to the ‘QuantLib’ Library The ‘RQuantLib’ package makes parts of ‘QuantLib’ accessible from R The ‘QuantLib’ project aims to provide a comprehensive software framework for quantitative finance. The goal is to provide a standard open source library for quantitative analysis, modeling, trading, and risk management of financial assets.
1202 Empirical Finance rugarch (core) Univariate GARCH Models ARFIMA, in-mean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.
1203 Empirical Finance rwt Rice Wavelet Toolbox wrapper Provides a set of functions for performing digital signal processing.
1204 Empirical Finance sandwich Robust Covariance Matrix Estimators Model-robust standard error estimators for cross-sectional, time series, clustered, panel, and longitudinal data.
1205 Empirical Finance sde Simulation and Inference for Stochastic Differential Equations Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 978-0-387-75838-1, Springer, NY.
1206 Empirical Finance SharpeR Statistical Significance of the Sharpe Ratio A collection of tools for analyzing significance of assets, funds, and trading strategies, based on the Sharpe ratio and overfit of the same. Provides density, distribution, quantile and random generation of the Sharpe ratio distribution based on normal returns, as well as the optimal Sharpe ratio over multiple assets. Computes confidence intervals on the Sharpe and provides a test of equality of Sharpe ratios based on the Delta method.
1207 Empirical Finance sharpeRratio Moment-Free Estimation of Sharpe Ratios An efficient moment-free estimator of the Sharpe ratio, or signal-to-noise ratio, for heavy-tailed data (see <https://arxiv.org/abs/1505.01333>).
1208 Empirical Finance Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
1209 Empirical Finance SmithWilsonYieldCurve Smith-Wilson Yield Curve Construction Constructs a yield curve by the Smith-Wilson method from a table of LIBOR and SWAP rates
1210 Empirical Finance stochvol Efficient Bayesian Inference for Stochastic Volatility (SV) Models Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods. Methodological details are given in Kastner and Fruhwirth-Schnatter (2014) <doi:10.1016/j.csda.2013.01.002>; the most common use cases are described in Kastner (2016) <doi:10.18637/jss.v069.i05>. Also incorporates SV with leverage.
1211 Empirical Finance strucchange Testing, Monitoring, and Dating Structural Changes Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
1212 Empirical Finance TAQMNGR Manage Tick-by-Tick Transaction Data Manager of tick-by-tick transaction data that performs ‘cleaning’, ‘aggregation’ and ‘import’ in an efficient and fast way. The package engine, written in C++, exploits the ‘zlib’ and ‘gzstream’ libraries to handle gzipped data without need to uncompress them. ‘Cleaning’ and ‘aggregation’ are performed according to Brownlees and Gallo (2006) <doi:10.1016/j.csda.2006.09.030>. Currently, TAQMNGR processes raw data from WRDS (Wharton Research Data Service, <https://wrds-web.wharton.upenn.edu/wrds/>).
1213 Empirical Finance tawny Clean Covariance Matrices Using Random Matrix Theory and Shrinkage Estimators for Portfolio Optimization Portfolio optimization typically requires an estimate of a covariance matrix of asset returns. There are many approaches for constructing such a covariance matrix, some using the sample covariance matrix as a starting point. This package provides implementations for two such methods: random matrix theory and shrinkage estimation. Each method attempts to clean or remove noise related to the sampling process from the sample covariance matrix.
1214 Empirical Finance TFX R API to TrueFX(tm) Connects R to TrueFX(tm) for free streaming real-time and historical tick-by-tick market data for dealable interbank foreign exchange rates with millisecond detail.
1215 Empirical Finance tidyquant Tidy Quantitative Financial Analysis Bringing financial analysis to the ‘tidyverse’. The ‘tidyquant’ package provides a convenient wrapper to various ‘xts’, ‘zoo’, ‘quantmod’, ‘TTR’ and ‘PerformanceAnalytics’ package functions and returns the objects in the tidy ‘tibble’ format. The main advantage is being able to use quantitative functions with the ‘tidyverse’ functions including ‘purrr’, ‘dplyr’, ‘tidyr’, ‘ggplot2’, ‘lubridate’, etc. See the ‘tidyquant’ website for more information, documentation and examples.
1216 Empirical Finance timeDate (core) Rmetrics - Chronological and Calendar Objects The ‘timeDate’ class fulfils the conventions of the ISO 8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards it provides the “Financial Center” concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. It can thus also handle time stamps from historical data records from the same time zone, even if the financial centers changed day light saving times at different calendar dates.
1217 Empirical Finance timeSeries (core) Rmetrics - Financial Time Series Objects Provides a class and various tools for financial time series. This includes basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.
1218 Empirical Finance timsac Time Series Analysis and Control Package Functions for statistical analysis, prediction and control of time series based mainly on Akaike and Nakagawa (1988) <ISBN 978-90-277-2786-2>.
1219 Empirical Finance tis Time Indexes and Time Indexed Series Functions and S3 classes for time indexes and time indexed series, which are compatible with FAME frequencies.
1220 Empirical Finance TSdbi Time Series Database Interface Provides a common interface to time series databases. The objective is to define a standard interface so users can retrieve time series data from various sources with a simple, common, set of commands, and so programs can be written to be portable with respect to the data source. The SQL implementations also provide a database table design, so users needing to set up a time series database have a reasonably complete way to do this easily. The interface provides for a variety of options with respect to the representation of time series in R. The interface, and the SQL implementations, also handle vintages of time series data (sometime called editions or real-time data). There is also a (not yet well tested) mechanism to handle multilingual data documentation. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.
1221 Empirical Finance tsDyn Nonlinear Time Series Models with Regime Switching Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).
1222 Empirical Finance tseries (core) Time Series Analysis and Computational Finance Time series analysis and computational finance.
1223 Empirical Finance tseriesChaos Analysis of Nonlinear Time Series Routines for the analysis of nonlinear time series. This work is largely inspired by the TISEAN project, by Rainer Hegger, Holger Kantz and Thomas Schreiber: <http://www.mpipks-dresden.mpg.de/~tisean/>.
1224 Empirical Finance tsfa Time Series Factor Analysis Extraction of Factors from Multivariate Time Series. See ?00tsfa-Intro for more details.
1225 Empirical Finance TTR Technical Trading Rules Functions and data to construct technical trading rules with R.
1226 Empirical Finance tvm Time Value of Money Functions Functions for managing cashflows and interest rate curves.
1227 Empirical Finance urca (core) Unit Root and Cointegration Tests for Time Series Data Unit root and cointegration tests encountered in applied econometric analysis are implemented.
1228 Empirical Finance vars VAR Modelling Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.
1229 Empirical Finance VarSwapPrice Pricing a variance swap on an equity index Computes a portfolio of European options that replicates the cost of capturing the realised variance of an equity index.
1230 Empirical Finance vrtest Variance Ratio tests and other tests for Martingale Difference Hypothesis A collection of statistical tests for martingale difference hypothesis
1231 Empirical Finance wavelets Functions for Computing Wavelet Filters, Wavelet Transforms and Multiresolution Analyses Contains functions for computing and plotting discrete wavelet transforms (DWT) and maximal overlap discrete wavelet transforms (MODWT), as well as their inverses. Additionally, it contains functionality for computing and plotting wavelet transform filters that are used in the above decompositions as well as multiresolution analyses.
1232 Empirical Finance waveslim Basic Wavelet Routines for One-, Two- And Three-Dimensional Signal Processing Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dual-tree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 4-7 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.
1233 Empirical Finance wavethresh Wavelets Statistics and Transforms Performs 1, 2 and 3D real and complex-valued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complex-valued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.
1234 Empirical Finance XBRL Extraction of Business Financial Information from ‘XBRL’ Documents Functions to extract business financial information from an Extensible Business Reporting Language (‘XBRL’) instance file and the associated collection of files that defines its ‘Discoverable’ Taxonomy Set (‘DTS’).
1235 Empirical Finance xts (core) eXtensible Time Series Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
1236 Empirical Finance ycinterextra Yield curve or zero-coupon prices interpolation and extrapolation Yield curve or zero-coupon prices interpolation and extrapolation using the Nelson-Siegel, Svensson, Smith-Wilson models, and Hermite cubic splines.
1237 Empirical Finance YieldCurve Modelling and estimation of the yield curve Modelling the yield curve with some parametric models. The models implemented are: Nelson-Siegel, Diebold-Li and Svensson. The package also includes the data of the term structure of interest rate of Federal Reserve Bank and European Central Bank.
1238 Empirical Finance Zelig Everyone’s Statistical Software A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
1239 Empirical Finance zoo (core) S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
1240 Functional Data Analysis classiFunc Classification of Functional Data Efficient implementation of k-nearest neighbor estimation and kernel estimation for functional data classification.
1241 Functional Data Analysis covsep Tests for Determining if the Covariance Structure of 2-Dimensional Data is Separable Functions for testing if the covariance structure of 2-dimensional data (e.g. samples of surfaces X_i = X_i(s,t)) is separable, i.e. if covariance(X) = C_1 x C_2. A complete descriptions of the implemented tests can be found in the paper Aston, John A. D.; Pigoli, Davide; Tavakoli, Shahin. Tests for separability in nonparametric covariance operators of random surfaces. Ann. Statist. 45 (2017), no. 4, 14311461. <doi:10.1214/16-AOS1495> <https://projecteuclid.org/euclid.aos/1498636862> <arXiv:1505.02023>.
1242 Functional Data Analysis dbstats Distance-Based Statistics Prediction methods where explanatory information is coded as a matrix of distances between individuals. Distances can either be directly input as a distances matrix, a squared distances matrix, an inner-products matrix or computed from observed predictors.
1243 Functional Data Analysis ddalpha Depth-Based Classification and Calculation of Data Depth Contains procedures for depth-based supervised learning, which are entirely non-parametric, in particular the DDalpha-procedure (Lange, Mosler and Mozharovskyi, 2014 <doi:10.1007/s00362-012-0488-4>). The training data sample is transformed by a statistical depth function to a compact low-dimensional space, where the final classification is done. It also offers an extension to functional data and routines for calculating certain notions of statistical depth functions. 50 multivariate and 5 functional classification problems are included.
1244 Functional Data Analysis denseFLMM Functional Linear Mixed Models for Densely Sampled Data Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis.
1245 Functional Data Analysis fda (core) Functional Data Analysis These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer. They were ported from earlier versions in Matlab and S-PLUS. An introduction appears in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009) Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions of the code and sample analyses are no longer distributed through CRAN, as they were when the book was published. For those, ftp from <http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/> There you find a set of .zip files containing the functions and sample analyses, as well as two .txt files giving instructions for installation and some additional information. The changes from Version 2.4.1 are fixes of bugs in density.fd and removal of functions create.polynomial.basis, polynompen, and polynomial. These were deleted because the monomial basis does the same thing and because there were errors in the code.
1246 Functional Data Analysis fda.usc (core) Functional Data Analysis and Utilities for Statistical Computing Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.
1247 Functional Data Analysis fdadensity Functional Data Analysis for Density Functions by Transformation to a Hilbert Space An implementation of the methodology described in Petersen and Mueller (2016) <doi:10.1214/15-AOS1363> for the functional data analysis of samples of density functions. Densities are first transformed to their corresponding log quantile densities, followed by ordinary Functional Principal Components Analysis (FPCA). Transformation modes of variation yield improved interpretation of the variability in the data as compared to FPCA on the densities themselves. The standard fraction of variance explained (FVE) criterion commonly used for functional data is adapted to the transformation setting, also allowing for an alternative quantification of variability for density data through the Wasserstein metric of optimal transport.
1248 Functional Data Analysis fdakma Functional Data Analysis: K-Mean Alignment It performs simultaneously clustering and alignment of a multidimensional or unidimensional functional dataset by means of k-mean alignment.
1249 Functional Data Analysis fdapace (core) Functional Data Analysis and Empirical Dynamics A versatile package that provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm. This core algorithm yields covariance and mean functions, eigenfunctions and principal component (scores), for both functional data and derivatives, for both dense (functional) and sparse (longitudinal) sampling designs. For sparse designs, it provides fitted continuous trajectories with confidence bands, even for subjects with very few longitudinal observations. PACE is a viable and flexible alternative to random effects modeling of longitudinal data. There is also a Matlab version (PACE) that contains some methods not available on fdapace and vice versa. Please cite our package if you use it (You may run the command citation(“fdapace”) to get the citation format and bibtex entry). References: Wang, J.L., Chiou, J., Muller, H.G. (2016) <doi:10.1146/annurev-statistics-041715-033624>; Chen, K., Zhang, X., Petersen, A., Muller, H.G. (2017) <doi:10.1007/s12561-015-9137-5>.
1250 Functional Data Analysis fdasrvf (core) Elastic Functional Data Analysis Performs alignment, PCA, and modeling of multidimensional and unidimensional functions using the square-root velocity framework (Srivastava et al., 2011 <arXiv:1103.3817> and Tucker et al., 2014 <doi:10.1016/j.csda.2012.12.001>). This framework allows for elastic analysis of functional data through phase and amplitude separation.
1251 Functional Data Analysis fdatest Interval Testing Procedure for Functional Data Implementation of the Interval Testing Procedure for functional data in different frameworks (i.e., one or two-population frameworks, functional linear models) by means of different basis expansions (i.e., B-spline, Fourier, and phase-amplitude Fourier). The current version of the package requires functional data evaluated on a uniform grid; it automatically projects each function on a chosen functional basis; it performs the entire family of multivariate tests; and, finally, it provides the matrix of the p-values of the previous tests and the vector of the corrected p-values. The functional basis, the coupled or uncoupled scenario, and the kind of test can be chosen by the user. The package provides also a plotting function creating a graphical output of the procedure: the p-value heat-map, the plot of the corrected p-values, and the plot of the functional data.
1252 Functional Data Analysis FDboost (core) Boosting Functional Regression Models Regression models for functional data, i.e., scalar-on-function, function-on-scalar and function-on-function regression models, are fitted by a component-wise gradient boosting algorithm.
1253 Functional Data Analysis fdcov Analysis of Covariance Operators Provides a variety of tools for the analysis of covariance operators including k-sample tests for equality and classification and clustering methods found in the works of Cabassi et al (2017) <doi:10.1214/17-EJS1347>, Kashlak et al (2017) <arXiv:1604.06310>, Pigoli et al (2014) <doi:10.1093/biomet/asu008>, and Panaretos et al (2010) <doi:10.1198/jasa.2010.tm09239>.
1254 Functional Data Analysis fds (core) Functional Data Sets Functional data sets.
1255 Functional Data Analysis flars Functional LARS Variable selection algorithm for functional linear regression with scalar response variable and mixed scalar/functional predictors.
1256 Functional Data Analysis fpca Restricted MLE for Functional Principal Components Analysis A geometric approach to MLE for functional principal components
1257 Functional Data Analysis freqdom Frequency Domain Based Analysis: Dynamic PCA Implementation of dynamic principal component analysis (DPCA), simulation of VAR and VMA processes and frequency domain tools. These frequency domain methods for dimensionality reduction of multivariate time series were introduced by David Brillinger in his book Time Series (1974). We follow implementation guidelines as described in Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.
1258 Functional Data Analysis freqdom.fda Functional Time Series: Dynamic Functional Principal Components Implementations of functional dynamic principle components analysis. Related graphic tools and frequency domain methods. These methods directly use multivariate dynamic principal components implementation, following the guidelines from Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.
1259 Functional Data Analysis ftsa (core) Functional Time Series Analysis Functions for visualizing, modeling, forecasting and hypothesis testing of functional time series.
1260 Functional Data Analysis ftsspec Spectral Density Estimation and Comparison for Functional Time Series Functions for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length.
1261 Functional Data Analysis funData An S4 Class for Functional Data S4 classes for univariate and multivariate functional data with utility functions.
1262 Functional Data Analysis funFEM Clustering in the Discriminative Functional Subspace The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.
1263 Functional Data Analysis funHDDC Univariate and Multivariate Model-Based Clustering in Group-Specific Functional Subspaces The funHDDC algorithm allows to cluster functional univariate (Bouveyron and Jacques, 2011, <doi:10.1007/s11634-011-0095-6>) or multivariate data (Schmutz et al., 2018) by modeling each group within a specific functional subspace.
1264 Functional Data Analysis funLBM Model-Based Co-Clustering of Functional Data The funLBM algorithm allows to simultaneously cluster the rows and the columns of a data matrix where each entry of the matrix is a function or a time series.
1265 Functional Data Analysis geofd Spatial Prediction for Function Value Data Kriging based methods are used for predicting functional data (curves) with spatial dependence.
1266 Functional Data Analysis GPFDA Apply Gaussian Process in Functional data analysis Use functional regression as the mean structure and Gaussian Process as the covariance structure.
1267 Functional Data Analysis growfunctions Bayesian Non-Parametric Dependent Models for Time-Indexed Functional Data Estimates a collection of time-indexed functions under either of Gaussian process (GP) or intrinsic Gaussian Markov random field (iGMRF) prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same covariance or precision parameters. The GP and iGMRF formulations both support any number of additive covariance or precision terms, respectively, expressing either or both of multiple trend and seasonality.
1268 Functional Data Analysis pcdpca Dynamic Principal Components for Periodically Correlated Functional Time Series Method extends multivariate and functional dynamic principal components to periodically correlated multivariate time series. This package allows you to compute true dynamic principal components in the presence of periodicity. We follow implementation guidelines as described in Kidzinski, Kokoszka and Jouzdani (2017), in Principal component analysis of periodically correlated functional time series <arXiv:1612.00040>.
1269 Functional Data Analysis rainbow Bagplots, Boxplots and Rainbow Plots for Functional Data Visualizing functional data and identifying functional outliers.
1270 Functional Data Analysis refund (core) Regression with Functional Data Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.
1271 Functional Data Analysis refund.shiny Interactive Plotting for Functional Data Analyses Interactive plotting for functional data analyses.
1272 Functional Data Analysis RFgroove Importance Measure and Selection for Groups of Variables with Random Forests Variable selection tools for groups of variables and functional data based on a new grouped variable importance with random forests.
1273 Functional Data Analysis roahd Robust Analysis of High Dimensional Data A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in high-dimensional cases, and hence with attention to computational efficiency and simplicity of use.
1274 Functional Data Analysis SCBmeanfd Simultaneous Confidence Bands for the Mean of Functional Data Statistical methods for estimating and inferring the mean of functional data. The methods include simultaneous confidence bands, local polynomial fitting, bandwidth selection by plug-in and cross-validation, goodness-of-fit tests for parametric models, equality tests for two-sample problems, and plotting functions.
1275 Functional Data Analysis sparseFLMM Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data Estimation of functional linear mixed models for irregularly or sparsely sampled data based on functional principal component analysis.
1276 Functional Data Analysis splinetree Longitudinal Regression Trees and Forests Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.
1277 Functional Data Analysis switchnpreg Switching nonparametric regression models for a single curve and functional data Functions for estimating the parameters from the latent state process and the functions corresponding to the J states as proposed by De Souza and Heckman (2013).
1278 Functional Data Analysis warpMix Mixed Effects Modeling with Warping for Functional Data Using B-Spline Mixed effects modeling with warping for functional data using B- spline. Warping coefficients are considered as random effects, and warping functions are general functions, parameters representing the projection onto B- spline basis of a part of the warping functions. Warped data are modelled by a linear mixed effect functional model, the noise is Gaussian and independent from the warping functions.
1279 Statistical Genetics adegenet Exploratory Analysis of Genetic and Genomic Data Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure (‘genind’ class), alleles counts by populations (‘genpop’), and genome-wide SNP data (‘genlight’). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.
1280 Statistical Genetics ape Analyses of Phylogenetics and Evolution Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
1281 Statistical Genetics Biodem Biodemography Functions The Biodem package provides a number of functions for Biodemographic analysis.
1282 Statistical Genetics bqtl Bayesian QTL Mapping Toolkit QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.
1283 Statistical Genetics dlmap Detection Localization Mapping for QTL QTL mapping in a mixed model framework with separate detection and localization stages. The first stage detects the number of QTL on each chromosome based on the genetic variation due to grouped markers on the chromosome; the second stage uses this information to determine the most likely QTL positions. The mixed model can accommodate general fixed and random effects, including spatial effects in field trials and pedigree effects. Applicable to backcrosses, doubled haploids, recombinant inbred lines, F2 intercrosses, and association mapping populations.
1284 Statistical Genetics gap (core) Genetic Analysis Package It is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates.
1285 Statistical Genetics genetics (core) Population Genetics Classes and methods for handling genetic data. Includes classes to represent genotypes and haplotypes at single markers up to multiple markers on multiple chromosomes. Function include allele frequencies, flagging homo/heterozygotes, flagging carriers of certain alleles, estimating and testing for Hardy-Weinberg disequilibrium, estimating and testing for linkage disequilibrium, …
1286 Statistical Genetics hapassoc Inference of Trait Associations with SNP Haplotypes and Other Attributes using the EM Algorithm The following R functions are used for inference of trait associations with haplotypes and other covariates in generalized linear models. The functions are developed primarily for data collected in cohort or cross-sectional studies. They can accommodate uncertain haplotype phase and handle missing genotypes at some SNPs.
1287 Statistical Genetics haplo.stats (core) Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous Routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The main functions are: haplo.em(), haplo.glm(), haplo.score(), and haplo.power(); all of which have detailed examples in the vignette.
1288 Statistical Genetics HardyWeinberg Statistical Tests and Graphics for Hardy-Weinberg Equilibrium Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) <doi:10.1126/science.28.706.49> for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots.
1289 Statistical Genetics hierfstat Estimation and Tests of Hierarchical F-Statistics Allows the estimation of hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution, 1998, 52(4):950-956; <doi:10.2307/2411227>. Functions are also given to test via randomisations the significance of each F and variance components, using the likelihood-ratio statistics G.
1290 Statistical Genetics hwde Models and Tests for Departure from Hardy-Weinberg Equilibrium and Independence Between Loci Fits models for genotypic disequilibria, as described in Huttley and Wilson (2000), Weir (1996) and Weir and Wilson (1986). Contrast terms are available that account for first order interactions between loci. Also implements, for a single locus in a single population, a conditional exact test for Hardy-Weinberg equilibrium.
1291 Statistical Genetics ibdreg Regression Methods for IBD Linkage With Covariates A method to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. Account for correlations of IBD statistics and covariates for relative pairs within the same pedigree.
1292 Statistical Genetics LDheatmap Graphical Display of Pairwise Linkage Disequilibria Between SNPs Produces a graphical display, as a heat map, of measures of pairwise linkage disequilibria between SNPs. Users may optionally include the physical locations or genetic map distances of each SNP on the plot. Users should note that the imported package ‘snpStats’ and the suggested packages ‘rtracklayer’, ‘GenomicRanges’, ‘GenomInfoDb’ and ‘IRanges’ are all BioConductor packages <https://bioconductor.org>.
1293 Statistical Genetics ouch Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.
1294 Statistical Genetics pbatR Pedigree/Family-Based Genetic Association Tests Analysis and Power This R package provides power calculations via internal simulation methods. The package also provides a frontend to the now abandoned PBAT program (developed by Christoph Lange), and reads in the corresponding output and displays results and figures when appropriate. The license of this R package itself is GPL. However, to have the program interact with the PBAT program for some functionality of the R package, users must additionally obtain the PBAT program from Christoph Lange, and accept his license. Both the data analysis and power calculations have command line and graphical interfaces using tcltk.
1295 Statistical Genetics phangorn Phylogenetic Reconstruction and Analysis Package contains methods for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation. Allows to compare trees, models selection and offers visualizations for trees and split networks.
1296 Statistical Genetics qtl Tools for Analyzing QTL Experiments Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits.
1297 Statistical Genetics rmetasim An Individual-Based Population Genetic Simulation Environment An interface between R and the metasim simulation engine. The simulation environment is documented in: “Strand, A.(2002) <doi:10.1046/j.1471-8286.2002.00208.x> Metasim 1.0: an individual-based environment for simulating population genetics of complex population dynamics. Mol. Ecol. Notes. Please see the vignettes CreatingLandscapes and Simulating to get some ideas on how to use the packages. See the rmetasim vignette to get an overview and to see important changes to the code in the most recent version.
1298 Statistical Genetics seqinr Biological Sequences Retrieval and Analysis Exploratory data analysis and data visualization for biological sequence (DNA and protein) data. Seqinr includes utilities for sequence data management under the ACNUC system described in Gouy, M. et al. (1984) Nucleic Acids Res. 12:121-127 <doi:10.1093/nar/12.1Part1.121>.
1299 Statistical Genetics snp.plotter snp.plotter Creates plots of p-values using single SNP and/or haplotype data. Main features of the package include options to display a linkage disequilibrium (LD) plot and the ability to plot multiple datasets simultaneously. Plots can be created using global and/or individual haplotype p-values along with single SNP p-values. Images are created as either PDF/EPS files.
1300 Statistical Genetics SNPmaxsel Maximally selected statistics for SNP data This package implements asymptotic methods related to maximally selected statistics, with applications to SNP data.
1301 Statistical Genetics stepwise Stepwise detection of recombination breakpoints A stepwise approach to identifying recombination breakpoints in a sequence alignment.
1302 Statistical Genetics tdthap TDT Tests for Extended Haplotypes Functions and examples are provided for Transmission/disequilibrium tests for extended marker haplotypes, as in Clayton, D. and Jones, H. (1999) “Transmission/disequilibrium tests for extended marker haplotypes”. Amer. J. Hum. Genet., 65:1161-1169, <doi:10.1086/302566>.
1303 Statistical Genetics untb Ecological Drift under the UNTB Hubbell’s Unified Neutral Theory of Biodiversity.
1304 Statistical Genetics wgaim Whole Genome Average Interval Mapping for QTL Detection and Estimation using ASReml-R A computationally efficient whole genome approach to detecting and estimating significant QTL in linkage maps using the flexible linear mixed modelling functionality of ASReml-R.
1305 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ade4 Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
1306 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization animation A Gallery of Animations in Statistics and Utilities to Create Animations Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, non-parametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, ‘GIF’, HTML pages, ‘PDF’ and videos. ‘PDF’ animations can be inserted into ‘Sweave’ / ‘knitr’ easily.
1307 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ape Analyses of Phylogenetics and Evolution Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
1308 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization aplpack Another Plot Package: ‘Bagplots’, ‘Iconplots’, ‘Summaryplots’, Slider Functions and Others Some functions for drawing some special plots: The function ‘bagplot’ plots a bagplot, ‘faces’ plots chernoff faces, ‘iconplot’ plots a representation of a frequency table or a data matrix, ‘plothulls’ plots hulls of a bivariate data set, ‘plotsummary’ plots a graphical summary of a data set, ‘puticon’ adds icons to a plot, ‘skyline.hist’ combines several histograms of a one dimensional data set in one plot, ‘slider’ functions supports some interactive graphics, ‘spin3R’ helps an inspection of a 3-dim point cloud, ‘stem.leaf’ plots a stem and leaf plot, ‘stem.leaf.backback’ plots back-to-back versions of stem and leaf plot.
1309 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ash David Scott’s ASH Routines David Scott’s ASH routines ported from S-PLUS to R.
1310 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization biclust BiCluster Algorithms The main function biclust() provides several algorithms to find biclusters in two-dimensional data: Cheng and Church (2000, ISBN:1-57735-115-0), spectral (2003) <doi:10.1101/gr.648603>, plaid model (2005) <doi:10.1016/j.csda.2004.02.003>, xmotifs (2003) <doi:10.1142/9789812776303_0008> and bimax (2006) <doi:10.1093/bioinformatics/btl060>. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.
1311 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization Cairo R Graphics Device using Cairo Graphics Library for Creating High-Quality Bitmap (PNG, JPEG, TIFF), Vector (PDF, SVG, PostScript) and Display (X11 and Win32) Output R graphics device using cairographics library that can be used to create high-quality vector (PDF, PostScript and SVG) and bitmap output (PNG,JPEG,TIFF), and high-quality rendering in displays (X11 and Win32). Since it uses the same back-end for all output, copying across formats is WYSIWYG. Files are created without the dependence on X11 or other external programs. This device supports alpha channel (semi-transparent drawing) and resulting images can contain transparent and semi-transparent regions. It is ideal for use in server environments (file output) and as a replacement for other devices that don’t have Cairo’s capabilities such as alpha support or anti-aliasing. Backends are modular such that any subset of backends is supported.
1312 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization cairoDevice Embeddable Cairo Graphics Device Driver This device uses Cairo and GTK to draw to the screen, file (png, svg, pdf, and ps) or memory (arbitrary GdkDrawable or Cairo context). The screen device may be embedded into RGtk2 interfaces and supports all interactive features of other graphics devices, including getGraphicsEvent().
1313 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization cba Clustering for Business Analytics Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.
1314 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization colorspace A Toolbox for Manipulating and Assessing Colors and Palettes Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided along with corresponding ggplot2 color scales. Color palette choice is aided by an interactive app (with either a Tcl/Tk or a shiny GUI) and shiny apps with an HCL color picker and a color vision deficiency emulator. Plotting functions for displaying and assessing palettes include color swatches, visualizations of the HCL space, and trajectories in HCL and/or RGB spectrum. Color manipulation functions include: desaturation, lightening/darkening, mixing, and simulation of color vision deficiencies (deutanomaly, protanomaly, tritanomaly).
1315 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization diagram Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book “A practical guide to ecological modelling - using R as a simulation platform” by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book “Solving Differential Equations in R” by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).
1316 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization dichromat Color Schemes for Dichromats Collapse red-green or green-blue distinctions to simulate the effects of different types of color-blindness.
1317 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gclus Clustering Graphics Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.
1318 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ggplot2 (core) Create Elegant Data Visualisations Using the Grammar of Graphics A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
1319 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gplots Various R Programming Tools for Plotting Data Various R programming tools for plotting data, including: - calculating and plotting locally smoothed summary function as (‘bandplot’, ‘wapply’), - enhanced versions of standard plots (‘barplot2’, ‘boxplot2’, ‘heatmap.2’, ‘smartlegend’), - manipulating colors (‘col2hex’, ‘colorpanel’, ‘redgreen’, ‘greenred’, ‘bluered’, ‘redblue’, ‘rich.colors’), - calculating and plotting two-dimensional data summaries (‘ci2d’, ‘hist2d’), - enhanced regression diagnostic plots (‘lmplot2’, ‘residplot’), - formula-enabled interface to ‘stats::lowess’ function (‘lowess’), - displaying textual data in plots (‘textplot’, ‘sinkplot’), - plotting a matrix where each cell contains a dot whose size reflects the relative magnitude of the elements (‘balloonplot’), - plotting “Venn” diagrams (‘venn’), - displaying Open-Office style plots (‘ooplot’), - plotting multiple data on same region, with separate axes (‘overplot’), - plotting means and confidence intervals (‘plotCI’, ‘plotmeans’), - spacing points in an x-y plot so they don’t overlap (‘space’).
1320 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gridBase Integration of base and grid graphics Integration of base and grid graphics
1321 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization hexbin Hexagonal Binning Routines Binning and plotting functions for hexagonal bins.
1322 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization IDPmisc ‘Utilities of Institute of Data Analyses and Process Design (www.zhaw.ch/idp)’ Different high-level graphics functions for displaying large datasets, displaying circular data in a very flexible way, finding local maxima, brewing color ramps, drawing nice arrows, zooming 2D-plots, creating figures with differently colored margin and plot region. In addition, the package contains auxiliary functions for data manipulation like omitting observations with irregular values or selecting data by logical vectors, which include NAs. Other functions are especially useful in spectroscopy and analyses of environmental data: robust baseline fitting, finding peaks in spectra, converting humidity measures.
1323 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization igraph Network Analysis and Visualization Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
1324 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization iplots iPlots - interactive graphics for R Interactive plots for R.
1325 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization JavaGD Java Graphics Device Graphics device routing all graphics commands to a Java program. The actual functionality of the JavaGD depends on the Java-side implementation. Simple AWT and Swing implementations are included.
1326 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization klaR Classification and Visualization Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to ‘svmlight’ and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.
1327 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization lattice (core) Trellis Graphics for R A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.
1328 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization latticeExtra Extra Graphical Utilities Based on Lattice Building on the infrastructure provided by the lattice package, this package provides several new high-level functions and methods, as well as additional utilities such as panel and axis annotation functions.
1329 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization misc3d Miscellaneous 3D Plots A collection of miscellaneous 3d plots, including isosurfaces.
1330 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization onion Octonions and Quaternions Quaternions and Octonions are four- and eight- dimensional extensions of the complex numbers. They are normed division algebras over the real numbers and find applications in spatial rotations (quaternions) and string theory and relativity (octonions). The quaternions are noncommutative and the octonions nonassociative. See RKS Hankin 2006, Rnews Volume 6/2: 49-51, and the package vignette, for more details.
1331 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization plotrix (core) Various Plotting Functions Lots of plots, various labeling, axis and color scaling functions.
1332 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RColorBrewer (core) ColorBrewer Palettes Provides color schemes for maps (and other graphics) designed by Cynthia Brewer as described at http://colorbrewer2.org
1333 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization rggobi Interface Between R and ‘GGobi’ A command-line interface to ‘GGobi’, an interactive and dynamic graphics package. ‘Rggobi’ complements the graphical user interface of ‘GGobi’ providing a way to fluidly transition between analysis and exploration, as well as automating common tasks.
1334 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization rgl (core) 3D Visualization Using OpenGL Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.
1335 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RGraphics Data and Functions from the Book R Graphics, Second Edition Data and Functions from the book R Graphics, Second Edition. There is a function to produce each figure in the book, plus several functions, classes, and methods defined in Chapter 8.
1336 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RGtk2 R Bindings for Gtk 2.8.0 and Above Facilities in the R language for programming graphical interfaces using Gtk, the Gimp Tool Kit.
1337 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RSvgDevice An R SVG graphics device A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics.
1338 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RSVGTipsDevice An R SVG Graphics Device with Dynamic Tips and Hyperlinks A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics. This version supports tooltips with 1 to 3 lines, hyperlinks, and line styles.
1339 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization scagnostics Compute scagnostics - scatterplot diagnostics Calculates graph theoretic scagnostics. Scagnostics describe various measures of interest for pairs of variables, based on their appearance on a scatterplot. They are useful tool for discovering interesting or unusual scatterplots from a scatterplot matrix, without having to look at every individual plot.
1340 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization scatterplot3d 3D Scatter Plot Plots a three dimensional (3D) point cloud.
1341 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization seriation Infrastructure for Ordering Objects Using Seriation Infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).
1342 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization tkrplot TK Rplot Simple mechanism for placing R graphics in a Tk widget.
1343 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization vcd (core) Visualizing Categorical Data Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).
1344 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization vioplot Violin Plot A violin plot is a combination of a box plot and a kernel density plot. This package allows extensive customisation of violin plots.
1345 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization xgobi Interface to the XGobi and XGvis programs for graphical data analysis Interface to the XGobi and XGvis programs for graphical data analysis.
1346 High-Performance and Parallel Computing with R aprof Amdahl’s Profiler, Directed Optimization Made Easy Assists the evaluation of whether and where to focus code optimization, using Amdahl’s law and visual aids based on line profiling. Amdahl’s profiler organizes profiling output files (including memory profiling) in a visually appealing way. It is meant to help to balance development vs. execution time by helping to identify the most promising sections of code to optimize and projecting potential gains. The package is an addition to R’s standard profiling tools and is not a wrapper for them.
1347 High-Performance and Parallel Computing with R batch Batching Routines in Parallel and Passing Command-Line Arguments to R Functions to allow you to easily pass command-line arguments into R, and functions to aid in submitting your R code in parallel on a cluster and joining the results afterward (e.g. multiple parameter values for simulations running in parallel, splitting up a permutation test in parallel, etc.). See ‘parseCommandArgs(…)’ for the main example of how to use this package.
1348 High-Performance and Parallel Computing with R BatchExperiments Statistical Experiments on Batch Computing Clusters Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.
1349 High-Performance and Parallel Computing with R BatchJobs Batch Computing with R Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.
1350 High-Performance and Parallel Computing with R batchtools Tools for Computation on Batch Systems As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ (<https://www.ibm.com/us-en/marketplace/hpc-workload-management>), ‘OpenLava’ (<http://www.openlava.org/>), ‘Univa Grid Engine’/‘Oracle Grid Engine’ (<http://www.univa.com/>), ‘Slurm’ (<http://slurm.schedmd.com/>), ‘TORQUE/PBS’ (<http://www.adaptivecomputing.com/products/open-source/torque/>), or ‘Docker Swarm’ (<https://docs.docker.com/swarm/>). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.
1351 High-Performance and Parallel Computing with R bcp Bayesian Analysis of Change Point Problems Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.
1352 High-Performance and Parallel Computing with R BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Mohammadi and Wit (2019) <doi:10.18637/jss.v089.i03>.
1353 High-Performance and Parallel Computing with R biglars Scalable Least-Angle Regression and Lasso Least-angle regression, lasso and stepwise regression for numeric datasets in which the number of observations is greater than the number of predictors. The functions can be used with the ff library to accomodate datasets that are too large to be held in memory.
1354 High-Performance and Parallel Computing with R biglm bounded memory linear and generalized linear models Regression for data too large to fit in memory
1355 High-Performance and Parallel Computing with R bigmemory Manage Massive Matrices with Shared Memory and Memory-Mapped Files Create, store, access, and manipulate massive matrices. Matrices are allocated to shared memory and may use memory-mapped files. Packages ‘biganalytics’, ‘bigtabulate’, ‘synchronicity’, and ‘bigalgebra’ provide advanced functionality.
1356 High-Performance and Parallel Computing with R bigstatsr Statistical Tools for Filebacked Big Matrices Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.
1357 High-Performance and Parallel Computing with R bnlearn Bayesian Network Structure Learning, Parameter Learning and Inference Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <http://www.bnlearn.com>.
1358 High-Performance and Parallel Computing with R caret Classification and Regression Training Misc functions for training and plotting classification and regression models.
1359 High-Performance and Parallel Computing with R clustermq Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM, PBS/Torque) Evaluate arbitrary function calls using workers on HPC schedulers in single line of code. All processing is done on the network without accessing the file system. Remote schedulers are supported via SSH.
1360 High-Performance and Parallel Computing with R data.table Extension of ‘data.frame’ Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
1361 High-Performance and Parallel Computing with R dclone Data Cloning and MCMC Tools for Maximum Likelihood Methods Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.
1362 High-Performance and Parallel Computing with R disk.frame Larger-than-RAM Disk-Based Data Manipulation Framework A disk-based data manipulation tool for working with large-than-RAM datasets. Aims to lower the barrier-to-entry for manipulating large datasets by adhering closely to popular and familiar data manipulation paradigms like dplyr verbs and data.table syntax.
1363 High-Performance and Parallel Computing with R doFuture A Universal Foreach Parallel Adapter using the Future API of the ‘future’ Package Provides a ‘%dopar%’ adapter such that any type of futures can be used as backends for the ‘foreach’ framework.
1364 High-Performance and Parallel Computing with R doMC Foreach Parallel Adaptor for ‘parallel’ Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package.
1365 High-Performance and Parallel Computing with R doMPI Foreach Parallel Adaptor for the Rmpi Package Provides a parallel backend for the %dopar% function using the Rmpi package.
1366 High-Performance and Parallel Computing with R doRedis Foreach parallel adapter for the rredis package A Redis parallel backend for the %dopar% function
1367 High-Performance and Parallel Computing with R doRNG Generic Reproducible Parallel Backend for ‘foreach’ Loops Provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by L’Ecuyer’s combined multiple-recursive generator [L’Ecuyer (1999), <doi:10.1287/opre.47.1.159>]. It enables to easily convert standard %dopar% loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.
1368 High-Performance and Parallel Computing with R doSNOW Foreach Parallel Adaptor for the ‘snow’ Package Provides a parallel backend for the %dopar% function using the snow package of Tierney, Rossini, Li, and Sevcikova.
1369 High-Performance and Parallel Computing with R dqrng Fast Pseudo Random Number Generators Several fast random number generators are provided as C++ header only libraries: The PCG family by O’Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as Xoroshiro128+ and Xoshiro256+ by Blackman and Vigna (2018 <arXiv:1805.01407>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+ and Xoshiro256+ as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011 <doi:10.1145/2063384.2063405>) as provided by the package ‘sitmo’.
1370 High-Performance and Parallel Computing with R drake A Pipeline Toolkit for Reproducible Computation at Scale A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://ropenscilabs.github.io/drake-manual/>.
1371 High-Performance and Parallel Computing with R ff Memory-Efficient Storage of Large Data on Disk and Fast Access Functions The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R’s standard atomic data types ‘double’, ‘logical’, ‘raw’ and ‘integer’ and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example ‘quad’ allows efficient storage of genomic data as an ‘A’,‘T’,‘G’,‘C’ factor. The unsigned types support ‘circular’ arithmetic. There is also support for close-to-atomic types ‘factor’, ‘ordered’, ‘POSIXct’, ‘Date’ and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with ‘permanent’ files as well as creating/removing ‘temporary’ ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, ‘logicals’ and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package ‘bit’: chunked looping, fast bit operations and coercions between different objects that can store subscript information (‘bit’, ‘bitwhich’, ff ‘boolean’, ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.
1372 High-Performance and Parallel Computing with R ffbase Basic Statistical Functions for Package ‘ff’ Extends the out of memory vectors of ‘ff’ with statistical functions and other utilities to ease their usage.
1373 High-Performance and Parallel Computing with R flowr Streamlining Design and Deployment of Complex Workflows This framework allows you to design and implement complex pipelines, and deploy them on your institution’s computing cluster. This has been built keeping in mind the needs of bioinformatics workflows. However, it is easily extendable to any field where a series of steps (shell commands) are to be executed in a (work)flow.
1374 High-Performance and Parallel Computing with R foreach Provides Foreach Looping Construct Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.
1375 High-Performance and Parallel Computing with R future Unified Parallel and Distributed Processing in R for Everyone The purpose of this package is to provide a lightweight and unified Future API for sequential and parallel processing of R expression via futures. The simplest way to evaluate an expression in parallel is to use ‘x %<-% { expression }’ with ‘plan(multiprocess)’. This package implements sequential, multicore, multisession, and cluster futures. With these, R expressions can be evaluated on the local machine, in parallel a set of local machines, or distributed on a mix of local and remote machines. Extensions to this package implement additional backends for processing futures via compute cluster schedulers etc. Because of its unified API, there is no need to modify any code in order switch from sequential on the local machine to, say, distributed processing on a remote compute cluster. Another strength of this package is that global variables and functions are automatically identified and exported as needed, making it straightforward to tweak existing code to make use of futures.
1376 High-Performance and Parallel Computing with R future.BatchJobs A Future API for Parallel and Distributed Processing using BatchJobs Implementation of the Future API on top of the ‘BatchJobs’ package. This allows you to process futures, as defined by the ‘future’ package, in parallel out of the box, not only on your local machine or ad-hoc cluster of machines, but also via high-performance compute (‘HPC’) job schedulers such as ‘LSF’, ‘OpenLava’, ‘Slurm’, ‘SGE’, and ‘TORQUE’ / ‘PBS’, e.g. ‘y <- future.apply::future_lapply(files, FUN = process)’. NOTE: The ‘BatchJobs’ package is deprecated in favor of the ‘batchtools’ package. Because of this, it is recommended to use the ‘future.batchtools’ package instead of this package.
1377 High-Performance and Parallel Computing with R GAMBoost Generalized linear and additive models by likelihood based boosting This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized B-splines
1378 High-Performance and Parallel Computing with R gcbd ‘GPU’/CPU Benchmarking in Debian-Based Systems ‘GPU’/CPU Benchmarking on Debian-package based systems This package benchmarks performance of a few standard linear algebra operations (such as a matrix product and QR, SVD and LU decompositions) across a number of different ‘BLAS’ libraries as well as a ‘GPU’ implementation. To do so, it takes advantage of the ability to ‘plug and play’ different ‘BLAS’ implementations easily on a Debian and/or Ubuntu system. The current version supports - ‘Reference BLAS’ (‘refblas’) which are un-accelerated as a baseline - Atlas which are tuned but typically configure single-threaded - Atlas39 which are tuned and configured for multi-threaded mode - ‘Goto Blas’ which are accelerated and multi-threaded - ‘Intel MKL’ which is a commercial accelerated and multithreaded version. As for ‘GPU’ computing, we use the CRAN package - ‘gputools’ For ‘Goto Blas’, the ‘gotoblas2-helper’ script from the ISM in Tokyo can be used. For ‘Intel MKL’ we use the Revolution R packages from Ubuntu 9.10.
1379 High-Performance and Parallel Computing with R gpuR GPU Functions for R Objects Provides GPU enabled functions for R objects in a simple and approachable manner. New gpu* and vcl* classes have been provided to wrap typical R objects (e.g. vector, matrix), in both host and device spaces, to mirror typical R syntax without the need to know OpenCL.
1380 High-Performance and Parallel Computing with R GUIProfiler Graphical User Interface for Rprof() Show graphically the results of profiling R functions by tracking their execution time.
1381 High-Performance and Parallel Computing with R h2o R Interface for ‘H2O’ R interface for ‘H2O’, the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).
1382 High-Performance and Parallel Computing with R HadoopStreaming Utilities for using R scripts in Hadoop streaming Provides a framework for writing map/reduce scripts for use in Hadoop Streaming. Also facilitates operating on data in a streaming fashion, without Hadoop.
1383 High-Performance and Parallel Computing with R HistogramTools Utility Functions for R Histograms Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.
1384 High-Performance and Parallel Computing with R inline Functions to Inline C, C++, Fortran Function Calls from R Functionality to dynamically define R functions and S4 methods with ‘inlined’ C, C++ or Fortran code supporting the .C and .Call calling conventions.
1385 High-Performance and Parallel Computing with R keras R Interface to ‘Keras’ Interface to ‘Keras’ <https://keras.io>, a high-level neural networks ‘API’. ‘Keras’ was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both ‘CPU’ and ‘GPU’ devices.
1386 High-Performance and Parallel Computing with R LaF Fast Access to Large ASCII Files Methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.
1387 High-Performance and Parallel Computing with R latentnet Latent Position and Cluster Models for Statistical Networks Fit and simulate latent position and cluster models for statistical networks.
1388 High-Performance and Parallel Computing with R lga Tools for linear grouping analysis (LGA) Tools for linear grouping analysis. Three user-level functions: gap, rlga and lga.
1389 High-Performance and Parallel Computing with R Matching Multivariate and Propensity Score Matching with Balance Optimization Provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance has been obtained are also provided.
1390 High-Performance and Parallel Computing with R MonetDB.R Connect MonetDB to R Allows to pull data from MonetDB into R. Includes a DBI implementation and a dplyr backend.
1391 High-Performance and Parallel Computing with R mvnfast Fast Multivariate Normal and Student’s t Methods Provides computationally efficient tools related to the multivariate normal and Student’s t distributions. The main functionalities are: simulating multivariate random vectors, evaluating multivariate normal or Student’s t densities and Mahalanobis distances. These tools are very efficient thanks to the use of C++ code and of the OpenMP API.
1392 High-Performance and Parallel Computing with R nws R functions for NetWorkSpaces and Sleigh Provides coordination and parallel execution facilities, as well as limited cross-language data exchange, using the netWorkSpaces server developed by REvolution Computing
1393 High-Performance and Parallel Computing with R OpenCL Interface Allowing R to Use OpenCL Provides an interface to OpenCL, allowing R to leverage computing power of GPUs and other HPC accelerator devices.
1394 High-Performance and Parallel Computing with R orloca Operations Research LOCational Analysis Models Objects and methods to handle and solve the min-sum location problem, also known as Fermat-Weber problem. The min-sum location problem search for a point such that the weighted sum of the distances to the demand points are minimized. See “The Fermat-Weber location problem revisited” by Brimberg, Mathematical Programming, 1, pg. 71-76, 1995. <doi:10.1007/BF01592245>. General global optimization algorithms are used to solve the problem, along with the adhoc Weiszfeld method, see “Sur le point pour lequel la Somme des distances de n points donnes est minimum”, by Weiszfeld, Tohoku Mathematical Journal, First Series, 43, pg. 355-386, 1937 or “On the point for which the sum of the distances to n given points is minimum”, by E. Weiszfeld and F. Plastria, Annals of Operations Research, 167, pg. 7-41, 2009. <doi:10.1007/s10479-008-0352-z>.
1395 High-Performance and Parallel Computing with R parSim Parallel Simulation Studies Perform flexible simulation studies using one or multiple computer cores. The package is set up to be usable on high-performance clusters in addition to being run locally, see examples on <https://github.com/SachaEpskamp/parSim>.
1396 High-Performance and Parallel Computing with R partDSA Partitioning Using Deletion, Substitution, and Addition Moves A novel tool for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.
1397 High-Performance and Parallel Computing with R pbapply Adding Progress Bar to ’*apply’ Functions A lightweight package that adds progress bar to vectorized R functions (’*apply’). The implementation can easily be added to functions where showing the progress is useful (e.g. bootstrap). The type and style of the progress bar (with percentages or remaining time) can be set through options. Supports several parallel processing backends.
1398 High-Performance and Parallel Computing with R pbdBASE Programming with Big Data Base Wrappers for Distributed Matrices An interface to and extensions for the ‘PBLAS’ and ‘ScaLAPACK’ numerical libraries. This enables R to utilize distributed linear algebra for codes written in the ‘SPMD’ fashion. This interface is deliberately low-level and mimics the style of the native libraries it wraps. For a much higher level way of managing distributed matrices, see the ‘pbdDMAT’ package.
1399 High-Performance and Parallel Computing with R pbdDEMO Programming with Big Data Demonstrations and Examples Using ‘pbdR’ Packages A set of demos of ‘pbdR’ packages, together with a useful, unifying vignette.
1400 High-Performance and Parallel Computing with R pbdDMAT ‘pbdR’ Distributed Matrix Methods A set of classes for managing distributed matrices, and a collection of methods for computing linear algebra and statistics. Computation is handled mostly by routines from the ‘pbdBASE’ package, which itself relies on the ‘ScaLAPACK’ and ‘PBLAS’ numerical libraries for distributed computing.
1401 High-Performance and Parallel Computing with R pbdMPI Programming with Big Data Interface to MPI An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data (‘SPMD’) parallel programming style, which is intended for batch parallel execution.
1402 High-Performance and Parallel Computing with R pbdNCDF4 Programming with Big Data Interface to Parallel Unidata NetCDF4 Format Data Files This package adds collective parallel read and write capability to the R package ncdf4 version 1.8. Typical use is as a parallel NetCDF4 file reader in SPMD style programming. Each R process reads and writes its own data in a synchronized collective mode, resulting in faster parallel performance. Performance improvement is conditional on a parallel file system.
1403 High-Performance and Parallel Computing with R pbdPROF Programming with Big Data ― MPI Profiling Tools MPI profiling tools.
1404 High-Performance and Parallel Computing with R pbdSLAP Programming with Big Data Scalable Linear Algebra Packages Utilizing scalable linear algebra packages mainly including ‘BLACS’, ‘PBLAS’, and ‘ScaLAPACK’ in double precision via ‘pbdMPI’ based on ‘ScaLAPACK’ version 2.0.2.
1405 High-Performance and Parallel Computing with R peperr Parallelised Estimation of Prediction Error Designed for prediction error estimation through resampling techniques, possibly accelerated by parallel execution on a compute cluster. Newly developed model fitting routines can be easily incorporated.
1406 High-Performance and Parallel Computing with R permGPU Using GPUs in Statistical Genomics Can be used to carry out permutation resampling inference in the context of RNA microarray studies.
1407 High-Performance and Parallel Computing with R pls Partial Least Squares and Principal Component Regression Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).
1408 High-Performance and Parallel Computing with R pmclust Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs ‘pbdMPI’ to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through ‘pbdMPI’ and MPI’ implementations such as ‘OpenMPI’ and ‘MPICH’. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.
1409 High-Performance and Parallel Computing with R profr An Alternative Display for Profiling Information An alternative data structure and visual rendering for the profiling information generated by Rprof.
1410 High-Performance and Parallel Computing with R proftools Profile Output Processing Tools for R Tools for examining Rprof profile output.
1411 High-Performance and Parallel Computing with R profvis Interactive Visualizations for Profiling R Code Interactive visualizations for profiling R code.
1412 High-Performance and Parallel Computing with R pvclust Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) p-value as well as BP (bootstrap probability) value for each cluster in a dendrogram.
1413 High-Performance and Parallel Computing with R qsub Running Commands Remotely on ‘Gridengine’ Clusters Run lapply() calls in parallel by submitting them to ‘gridengine’ clusters using the ‘qsub’ command.
1414 High-Performance and Parallel Computing with R randomForestSRC Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) Fast OpenMP parallel computing of Breiman’s random forests for survival, competing risks, regression and classification based on Ishwaran and Kogalur’s popular random survival forests (RSF) package. Handles missing data and now includes multivariate, unsupervised forests, quantile regression and solutions for class imbalanced data. New fast interface using subsampling and confidence regions for variable importance.
1415 High-Performance and Parallel Computing with R Rborist Extensible, Parallelizable Implementation of the Random Forest Algorithm Scalable implementation of classification and regression forests, as described by Breiman (2001), <doi:10.1023/A:1010933404324>.
1416 High-Performance and Parallel Computing with R Rcpp Seamless R and C++ Integration The ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of third-party libraries. Documentation about ‘Rcpp’ is provided by several vignettes included in this package, via the ‘Rcpp Gallery’ site at <http://gallery.rcpp.org>, the paper by Eddelbuettel and Francois (2011, <doi:10.18637/jss.v040.i08>), the book by Eddelbuettel (2013, <doi:10.1007/978-1-4614-6868-4>) and the paper by Eddelbuettel and Balamuta (2018, <doi:10.1080/00031305.2017.1375990>); see ‘citation(“Rcpp”)’ for details.
1417 High-Performance and Parallel Computing with R RcppParallel Parallel Programming Tools for ‘Rcpp’ High level functions for parallel programming with ‘Rcpp’. For example, the ‘parallelFor()’ function can be used to convert the work of a standard serial “for” loop into a parallel one and the ‘parallelReduce()’ function can be used for accumulating aggregate or other values.
1418 High-Performance and Parallel Computing with R Rdsm Threads Environment for R Provides a threads-type programming environment for R. The package gives the R programmer the clearer, more concise shared memory world view, and in some cases gives superior performance as well. In addition, it enables parallel processing on very large, out-of-core matrices.
1419 High-Performance and Parallel Computing with R reticulate Interface to ‘Python’ Interface to ‘Python’ modules, classes, and functions. When calling into ‘Python’, R data types are automatically converted to their equivalent ‘Python’ types. When values are returned from ‘Python’ to R they are converted back to R types. Compatible with all versions of ‘Python’ >= 2.7.
1420 High-Performance and Parallel Computing with R rgenoud R Version of GENetic Optimization Using Derivatives A genetic algorithm plus derivative optimizer.
1421 High-Performance and Parallel Computing with R Rhpc Permits *apply() Style Dispatch for ‘HPC’ Function of apply style using ‘MPI’ provides better ‘HPC’ environment on R. And this package supports long vector, can deal with slightly big data.
1422 High-Performance and Parallel Computing with R RhpcBLASctl Control the Number of Threads on ‘BLAS’ Control the number of threads on ‘BLAS’ (Aka ‘GotoBLAS’, ‘OpenBLAS’, ‘ACML’, ‘BLIS’ and ‘MKL’). And possible to control the number of threads in ‘OpenMP’. Get a number of logical cores and physical cores if feasible.
1423 High-Performance and Parallel Computing with R RInside C++ Classes to Embed R in C++ Applications C++ classes to embed R in C++ applications A C++ class providing the R interpreter is offered by this package making it easier to have “R inside” your C++ application. As R itself is embedded into your application, a shared library build of R is required. This works on Linux, OS X and even on Windows provided you use the same tools used to build R itself. d Numerous examples are provided in the eight subdirectories of the examples/ directory of the installed package: standard, ‘mpi’ (for parallel computing), ‘qt’ (showing how to embed ‘RInside’ inside a Qt GUI application), ‘wt’ (showing how to build a “web-application” using the Wt toolkit), ‘armadillo’ (for ‘RInside’ use with ‘RcppArmadillo’) and ‘eigen’ (for ‘RInside’ use with ‘RcppEigen’). The examples use ‘GNUmakefile(s)’ with GNU extensions, so a GNU make is required (and will use the ‘GNUmakefile’ automatically). ‘Doxygen’-generated documentation of the C++ classes is available at the ‘RInside’ website as well.
1424 High-Performance and Parallel Computing with R rJava Low-Level R to Java Interface Low-level interface to Java VM very much like .C/.Call and friends. Allows creation of objects, calling methods and accessing fields.
1425 High-Performance and Parallel Computing with R rlecuyer R Interface to RNG with Multiple Streams Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.
1426 High-Performance and Parallel Computing with R Rmpi (core) Interface (Wrapper) to MPI (Message-Passing Interface) An interface (wrapper) to MPI. It also provides interactive R manager and worker environment.
1427 High-Performance and Parallel Computing with R RProtoBuf R Interface to the ‘Protocol Buffers’ ‘API’ (Version 2 or 3) Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal ‘RPC’ protocols and file formats. Additional documentation is available in two included vignettes one of which corresponds to our ‘JSS’ paper (2016, <doi:10.18637/jss.v071.i02>. Either version 2 or 3 of the ‘Protocol Buffers’ ‘API’ is supported.
1428 High-Performance and Parallel Computing with R rredis “Redis” Key/Value Database Client R client interface to the “Redis” key-value database.
1429 High-Performance and Parallel Computing with R rslurm Submit R Calculations to a Slurm Cluster Functions that simplify submitting R scripts to a Slurm workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.
1430 High-Performance and Parallel Computing with R rstream Streams of Random Numbers Unified object oriented interface for multiple independent streams of random numbers from different sources.
1431 High-Performance and Parallel Computing with R Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
1432 High-Performance and Parallel Computing with R sitmo Parallel Pseudo Random Number Generator (PPRNG) ‘sitmo’ Header Files Provided within are two high quality and fast PPRNGs that may be used in an ‘OpenMP’ parallel environment. In addition, there is a generator for one dimensional low-discrepancy sequence. The objective of this library to consolidate the distribution of the ‘sitmo’ (C++98 & C++11), ‘threefry’ and ‘vandercorput’ (C++11-only) engines on CRAN by enabling others to link to the header files inside of ‘sitmo’ instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the ‘sitmo’ package and three accompanying vignette that provide additional information.
1433 High-Performance and Parallel Computing with R snow (core) Simple Network of Workstations Support for simple parallel computing in R.
1434 High-Performance and Parallel Computing with R snowfall Easier cluster computing (based on snow) Usability wrapper around snow for easier development of parallel R programs. This package offers e.g. extended error checks, and additional functions. All functions work in sequential mode, too, if no cluster is present or wished. Package is also designed as connector to the cluster management tool sfCluster, but can also used without it.
1435 High-Performance and Parallel Computing with R snowFT Fault Tolerant Simple Network of Workstations Extension of the snow package supporting fault tolerant and reproducible applications, as well as supporting easy-to-use parallel programming - only one function is needed. Dynamic cluster size is also available.
1436 High-Performance and Parallel Computing with R speedglm Fitting Linear and Generalized Linear Models to Large Data Sets Fitting linear models and generalized linear models to large data sets by updating algorithms.
1437 High-Performance and Parallel Computing with R sqldf Manipulate R Data Frames Using SQL The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. ‘RSQLite’, ‘RH2’, ‘RMySQL’ and ‘RPostgreSQL’ backends are supported.
1438 High-Performance and Parallel Computing with R ssgraph Bayesian Graphical Estimation using Spike-and-Slab Priors Bayesian estimation for undirected graphical models using spike-and-slab priors. The package handles continuous, discrete, and mixed data.
1439 High-Performance and Parallel Computing with R STAR Spike Train Analysis with R Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.
1440 High-Performance and Parallel Computing with R tensorflow R Interface to ‘TensorFlow’ Interface to ‘TensorFlow’ <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
1441 High-Performance and Parallel Computing with R tfestimators Interface to ‘TensorFlow’ Estimators Interface to ‘TensorFlow’ Estimators <https://www.tensorflow.org/programmers_guide/estimators>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
1442 High-Performance and Parallel Computing with R tm Text Mining Package A framework for text mining applications within R.
1443 High-Performance and Parallel Computing with R varSelRF Variable Selection using Random Forests Variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). Main applications in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).
1444 High-Performance and Parallel Computing with R xgboost Extreme Gradient Boosting Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
1445 Hydrological Data and Modeling airGR Suite of GR Hydrological Models for Precipitation-Runoff Modelling Hydrological modelling tools developed at Irstea-Antony (HYCAR Research Unit, France). The package includes several conceptual rainfall-runoff models (GR4H, GR4J, GR5J, GR6J, GR2M, GR1A), a snow accumulation and melt model (CemaNeige) and the associated functions for their calibration and evaluation. Use help(airGR) for package description and references.
1446 Hydrological Data and Modeling airGRteaching Teaching Hydrological Modelling with the GR Rainfall-Runoff Models (‘Shiny’ Interface Included) Add-on package to the ‘airGR’ package that simplifies its use and is aimed at being used for teaching hydrology. The package provides 1) three functions that allow to complete very simply a hydrological modelling exercise 2) plotting functions to help students to explore observed data and to interpret the results of calibration and simulation of the GR (‘Genie rural’) models 3) a ‘Shiny’ graphical interface that allows for displaying the impact of model parameters on hydrographs and models internal variables.
1447 Hydrological Data and Modeling berryFunctions Function Collection Related to Plotting and Hydrology Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.
1448 Hydrological Data and Modeling bigleaf Physical and Physiological Ecosystem Properties from Eddy Covariance Data Calculation of physical (e.g. aerodynamic conductance, surface temperature), and physiological (e.g. canopy conductance, water-use efficiency) ecosystem properties from eddy covariance data and accompanying meteorological measurements. Calculations assume the land surface to behave like a ‘big-leaf’ and return bulk ecosystem/canopy variables.
1449 Hydrological Data and Modeling biotic Calculation of Freshwater Biotic Indices Calculates a range of UK freshwater invertebrate biotic indices including BMWP, Whalley, WHPT, Habitat-specific BMWP, AWIC, LIFE and PSI.
1450 Hydrological Data and Modeling bomrang Australian Government Bureau of Meteorology (‘BOM’) Data Client Provides functions to interface with Australian Government Bureau of Meteorology (‘BOM’) data, fetching data and returning a tidy data frame of precis forecasts, historical and current weather data from stations, agriculture bulletin data, ‘BOM’ 0900 or 1500 weather bulletins and downloading and importing radar and satellite imagery files. Data (c) Australian Government Bureau of Meteorology Creative Commons (CC) Attribution 3.0 licence or Public Access Licence (PAL) as appropriate. See <http://www.bom.gov.au/other/copyright.shtml> for further details.
1451 Hydrological Data and Modeling boussinesq Analytic Solutions for (ground-water) Boussinesq Equation This package is a collection of R functions implemented from published and available analytic solutions for the One-Dimensional Boussinesq Equation (ground-water). In particular, the function “beq.lin” is the analytic solution of the linearized form of Boussinesq Equation between two different head-based boundary (Dirichlet) conditions; “beq.song” is the non-linear power-series analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.
1452 Hydrological Data and Modeling CityWaterBalance Track Flows of Water Through an Urban System Retrieves data and estimates unmeasured flows of water through the urban network. Any city may be modeled with preassembled data, but data for US cities can be gathered via web services using this package and dependencies ‘geoknife’ and ‘dataRetrieval’.
1453 Hydrological Data and Modeling clifro Easily Download and Visualise Climate Data from CliFlo CliFlo is a web portal to the New Zealand National Climate Database and provides public access (via subscription) to around 6,500 various climate stations (see <https://cliflo.niwa.co.nz/> for more information). Collating and manipulating data from CliFlo (hence clifro) and importing into R for further analysis, exploration and visualisation is now straightforward and coherent. The user is required to have an internet connection, and a current CliFlo subscription (free) if data from stations, other than the public Reefton electronic weather station, is sought.
1454 Hydrological Data and Modeling climatol Climate Tools (Series Homogenization and Derived Products) Functions for the quality control, homogenization and missing data infilling of climatological series and to obtain climatological summaries and grids from the results. Also functions to draw wind-roses and Walter&Lieth climate diagrams.
1455 Hydrological Data and Modeling climdex.pcic PCIC Implementation of Climdex Routines PCIC’s implementation of Climdex routines for computation of extreme climate indices.
1456 Hydrological Data and Modeling CoSMoS Complete Stochastic Modelling Solution A single framework, unifying, extending, and improving a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific ‘parent’ Gaussian process Papalexiou (2018) <doi:10.1016/j.advwatres.2018.02.013>.
1457 Hydrological Data and Modeling countyweather Compiles Meterological Data for U.S. Counties Interacts with NOAA data sources (including the NCDC API at <http://www.ncdc.noaa.gov/cdo-web/webservices/v2> and ISD data) using functions from the ‘rnoaa’ package to obtain and compile weather time series for U.S. counties. This work was supported in part by grants from the National Institute of Environmental Health Sciences (R00ES022631) and the Colorado State University Water Center.
1458 Hydrological Data and Modeling dataRetrieval Retrieval Functions for USGS and EPA Hydrologic and Water Quality Data Collection of functions to help retrieve U.S. Geological Survey (USGS) and U.S. Environmental Protection Agency (EPA) water quality and hydrology data from web services. USGS web services are discovered from National Water Information System (NWIS) <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Both EPA and USGS water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.
1459 Hydrological Data and Modeling driftR Drift Correcting Water Quality Data A tidy implementation of equations that correct for instrumental drift in continuous water quality monitoring data. There are many sources of water quality data including private (ex: YSI instruments) and open source (ex: USGS and NDBC), each of which are susceptible to errors/inaccuracies due to drift. This package allows the user to correct their data using one or two standard reference values in a uniform, reproducible way. The equations implemented are from Hasenmueller (2011) <doi:10.7936/K7N014KS>.
1460 Hydrological Data and Modeling dynatopmodel Implementation of the Dynamic TOPMODEL Hydrological Model A native R implementation and enhancement of the Dynamic TOPMODEL semi-distributed hydrological model. Includes some preprocessing, utility and routines for displaying outputs.
1461 Hydrological Data and Modeling Ecohydmod Ecohydrological Modelling Simulates the soil water balance (soil moisture, evapotranspiration, leakage and runoff), rainfall series by using the marked Poisson process and the vegetation growth through the normalized difference vegetation index (NDVI). Please see Souza et al. (2016) <doi:10.1002/hyp.10953>.
1462 Hydrological Data and Modeling EcoHydRology (core) A Community Modeling Foundation for Eco-Hydrology Provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex eco-hydrological interactions.
1463 Hydrological Data and Modeling ecoval Procedures for Ecological Assessment of Surface Waters Functions for evaluating and visualizing ecological assessment procedures for surface waters containing physical, chemical and biological assessments in the form of value functions.
1464 Hydrological Data and Modeling EGRET Exploration and Graphics for RivEr Trends Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS). The modeling method is introduced and discussed in Hirsch et al. (2010) <doi:10.1111/j.1752-1688.2010.00482.x>, and expanded in Hirsch and De Cicco (2015) <doi:10.3133/tm4A10>.
1465 Hydrological Data and Modeling EGRETci Exploration and Graphics for RivEr Trends Confidence Intervals Collection of functions to evaluate uncertainty of results from water quality analysis using the Weighted Regressions on Time Discharge and Season (WRTDS) method. This package is an add-on to the EGRET package that performs the WRTDS analysis. The WRTDS modeling method was initially introduced and discussed in Hirsch et al. (2010) <doi:10.1111/j.1752-1688.2010.00482.x>, and expanded in Hirsch and De Cicco (2015) <doi:10.3133/tm4A10>. The paper describing the uncertainty and confidence interval calculations is Hirsch et al. (2015) <doi:10.1016/j.envsoft.2015.07.017>.
1466 Hydrological Data and Modeling Evapotranspiration Modelling Actual, Potential and Reference Crop Evapotranspiration Uses data and constants to calculate potential evapotranspiration (PET) and actual evapotranspiration (AET) from 21 different formulations including Penman, Penman-Monteith FAO 56, Priestley-Taylor and Morton formulations.
1467 Hydrological Data and Modeling FAdist Distributions that are Sometimes Used in Hydrology Probability distributions that are sometimes useful in hydrology.
1468 Hydrological Data and Modeling FedData Functions to Automate Downloading Geospatial Data Available from Several Federated Data Sources Functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package enables extraction from seven datasets: The National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS); The National Hydrography Dataset (USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; the Global Historical Climatology Network (GHCN), coordinated by National Climatic Data Center at NOAA; the Daymet gridded estimates of daily weather parameters for North America, version 3, available from the Oak Ridge National Laboratory’s Distributed Active Archive Center (DAAC); the International Tree Ring Data Bank; and the National Land Cover Database (NLCD).
1469 Hydrological Data and Modeling FlowScreen Daily Streamflow Trend and Change Point Screening Screens daily streamflow time series for temporal trends and change-points. This package has been primarily developed for assessing the quality of daily streamflow time series. It also contains tools for plotting and calculating many different streamflow metrics. The package can be used to produce summary screening plots showing change-points and significant temporal trends for high flow, low flow, and/or baseflow statistics, or it can be used to perform more detailed hydrological time series analyses. The package was designed for screening daily streamflow time series from Water Survey Canada and the United States Geological Survey but will also work with streamflow time series from many other agencies.
1470 Hydrological Data and Modeling geoknife Web-Processing of Large Gridded Datasets Processes gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers.
1471 Hydrological Data and Modeling geotopbricks An R Plug-in for the Distributed Hydrological Model GEOtop It analyzes raster maps and other information as input/output files from the Hydrological Distributed Model GEOtop. It contains functions and methods to import maps and other keywords from geotop.inpts file. Some examples with simulation cases of GEOtop 2.x/3.x are presented in the package. Any information about the GEOtop Distributed Hydrological Model source code is available on www.geotop.org. Technical details about the model are available in Endrizzi et al, 2014 (<http://www.geosci-model-dev.net/7/2831/2014/gmd-7-2831-2014.html>).
1472 Hydrological Data and Modeling getMet Get Meteorological Data for Hydrologic Models Hydrologic models often require users to collect and format input meteorological data. This package contains functions for sourcing, formatting, and editing meteorological data for hydrologic models.
1473 Hydrological Data and Modeling GSODR Global Surface Summary of the Day (‘GSOD’) Weather Data Client Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day (‘GSOD’) weather data from the from the USA National Centers for Environmental Information (‘NCEI’). Units are converted from from United States Customary System (‘USCS’) units to International System of Units (‘SI’). Stations may be individually checked for number of missing days defined by the user, where stations with too many missing observations are omitted. Only stations with valid reported latitude and longitude values are permitted in the final data. Additional useful elements, saturation vapour pressure (‘es’), actual vapour pressure (‘ea’) and relative humidity are calculated from the original data and included in the final data set. The resulting metadata include station identification information, country, state, latitude, longitude, elevation, weather observations and associated flags. For information on the ‘GSOD’ data from ‘NCEI’, please see the ‘GSOD’ ‘readme.txt’ file available from, <http://www1.ncdc.noaa.gov/pub/data/gsod/readme.txt>.
1474 Hydrological Data and Modeling GWSDAT GroundWater Spatiotemporal Data Analysis Tool (GWSDAT) Shiny application for the analysis of groundwater monitoring data, designed to work with simple time-series data for solute concentration and ground water elevation, but can also plot non-aqueous phase liquid (NAPL) thickness if required. Also provides the import of a site basemap in GIS shapefile format.
1475 Hydrological Data and Modeling hddtools Hydrological Data Discovery Tools Facilitates discovery and handling of hydrological data, access to catalogues and databases.
1476 Hydrological Data and Modeling humidity Calculate Water Vapor Measures from Temperature and Dew Point Vapor pressure, relative humidity, absolute humidity, specific humidity, and mixing ratio are commonly used water vapor measures in meteorology. This R package provides functions for calculating saturation vapor pressure (hPa), partial water vapor pressure (Pa), relative humidity (%), absolute humidity (kg/m^3), specific humidity (kg/kg), and mixing ratio (kg/kg) from temperature (K) and dew point (K). Conversion functions between humidity measures are also provided.
1477 Hydrological Data and Modeling hydroApps Tools and models for hydrological applications Package providing tools for hydrological applications and models developed for regional analysis in Northwestern Italy.
1478 Hydrological Data and Modeling hydrogeo Groundwater Data Presentation and Interpretation Contains one function for drawing Piper diagrams (also called Piper-Hill diagrams) of water analyses for major ions.
1479 Hydrological Data and Modeling hydroGOF (core) Goodness-of-Fit Functions for Comparison of Simulated and Observed Hydrological Time Series S3 functions implementing both statistical and graphical goodness-of-fit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models. Missing values in observed and/or simulated values can be removed before computations. Comments / questions / collaboration of any kind are very welcomed.
1480 Hydrological Data and Modeling hydrolinks Hydrologic Network Linking Data and Tools Tools to link geographic data with hydrologic network, including lakes, streams and rivers. Includes automated download of U.S. National Hydrography Network and other hydrolayers.
1481 Hydrological Data and Modeling HydroMe R codes for estimating water retention and infiltration model parameters using experimental data This package is version 2 of HydroMe v.1 package. It estimates the parameters in infiltration and water retention models by curve-fitting method. The models considered are those that are commonly used in soil science. It has new models for water retention characteristic curve and debugging of errors in HydroMe v.1
1482 Hydrological Data and Modeling hydroPSO Particle Swarm Optimisation, with Focus on Environmental Models State-of-the-art version of the Particle Swarm Optimisation (PSO) algorithm (SPSO-2011 and SPSO-2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of non-smooth and non-linear functions. However, the main focus of hydroPSO is the calibration of environmental and other real-world models that need to be executed from the system console. hydroPSO is model-independent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to fine-tune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with user-friendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallel-capable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian). See Zambrano-Bigiarini and Rojas (2013) <doi:10.1016/j.envsoft.2013.01.004> for more details.
1483 Hydrological Data and Modeling hydroscoper Interface to the Greek National Data Bank for Hydrometeorological Information R interface to the Greek National Data Bank for Hydrological and Meteorological Information <http://www.hydroscope.gr/>. It covers Hydroscope’s data sources and provides functions to transliterate, translate and download them into tidy dataframes.
1484 Hydrological Data and Modeling hydrostats Hydrologic Indices for Daily Time Series Data Calculates a suite of hydrologic indices for daily time series data that are widely used in hydrology and stream ecology.
1485 Hydrological Data and Modeling hydroTSM (core) Time Series Management, Analysis and Interpolation for Hydrological Modelling S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.
1486 Hydrological Data and Modeling hyfo Hydrology and Climate Forecasting Focuses on data processing and visualization in hydrology and climate forecasting. Main function includes data extraction, data downscaling, data resampling, gap filler of precipitation, bias correction of forecasting data, flexible time series plot, and spatial map generation. It is a good pre- processing and post-processing tool for hydrological and hydraulic modellers.
1487 Hydrological Data and Modeling IDF Estimation and Plotting of IDF Curves Intensity-duration-frequency (IDF) curves are a widely used analysis-tool in hydrology to assess extreme values of precipitation [e.g. Mailhot et al., 2007, <doi:10.1016/j.jhydrol.2007.09.019>]. The package ‘IDF’ provides a function to read precipitation data from German weather service (DWD) ‘webwerdis’ <http://www.dwd.de/EN/ourservices/webwerdis/webwerdis.html> files and Berlin station data from ‘Stadtmessnetz’ <http://www.geo.fu-berlin.de/en/met/service/stadtmessnetz/index.html> files, and additionally IDF parameters can be estimated also from a given data.frame containing a precipitation time series. The data is aggregated to given levels yearly intensity maxima are calculated either for the whole year or given months. From these intensity maxima IDF parameters are estimated on the basis of a duration-dependent generalised extreme value distribution [Koutsoyannis et al., 1998, <doi:10.1016/S0022-1694(98)00097-3>]. IDF curves based on these estimated parameters can be plotted.
1488 Hydrological Data and Modeling kitagawa Spectral Response of Water Wells to Harmonic Strain and Pressure Signals Provides tools to calculate the theoretical hydrodynamic response of an aquifer undergoing harmonic straining or pressurization, or analyze measured responses. There are two classes of models here: (1) for sealed wells, based on the model of Kitagawa et al (2011, <doi:10.1029/2010JB007794>), and (2) for open wells, based on the models of Cooper et al (1965, <doi:10.1029/JZ070i016p03915>), Hsieh et al (1987, <doi:10.1029/WR023i010p01824>), Rojstaczer (1988, <doi:10.1029/JB093iB11p13619>), and Liu et al (1989, <doi:10.1029/JB094iB07p09453>). These models treat strain (or aquifer head) as an input to the physical system, and fluid-pressure (or water height) as the output. The applicable frequency band of these models is characteristic of seismic waves, atmospheric pressure fluctuations, and solid earth tides.
1489 Hydrological Data and Modeling kiwisR A Wrapper for Querying KISTERS ‘WISKI’ Databases via the ‘KiWIS’ API A wrapper for querying ‘WISKI’ databases via the ‘KiWIS’ ‘REST’ API. ‘WISKI’ is an ‘SQL’ relational database used for the collection and storage of water data developed by KISTERS and ‘KiWIS’ is a ‘REST’ service that provides access to ‘WISKI’ databases via HTTP requests (<https://water.kisters.de/en/technology-trends/kisters-and-open-data/>). Contains a list of default databases (called ‘hubs’) and also allows users to provide their own ‘KiWIS’ URL. Supports the entire query process- from metadata to specific time series values. All data is returned as tidy tibbles.
1490 Hydrological Data and Modeling kwb.hantush Calculation of Groundwater Mounding Beneath an Infiltration Basin Calculation groundwater mounding beneath an infiltration basin based on the Hantush (1967) equation (<doi:10.1029/WR003i001p00227>). The correct implementation is shown with a verification example based on a USGS report (page 25, <https://pubs.usgs.gov/sir/2010/5102/support/sir2010-5102.pdf#page=35>).
1491 Hydrological Data and Modeling lakemorpho Lake Morphometry Metrics Lake morphometry metrics are used by limnologists to understand, among other things, the ecological processes in a lake. Traditionally, these metrics are calculated by hand, with planimeters, and increasingly with commercial GIS products. All of these methods work; however, they are either outdated, difficult to reproduce, or require expensive licenses to use. The ‘lakemorpho’ package provides the tools to calculate a typical suite of these metrics from an input elevation model and lake polygon. The metrics currently supported are: fetch, major axis, minor axis, major/minor axis ratio, maximum length, maximum width, mean width, maximum depth, mean depth, shoreline development, shoreline length, surface area, and volume.
1492 Hydrological Data and Modeling lfstat Calculation of Low Flow Statistics for Daily Stream Flow Data The “Manual on Low-flow Estimation and Prediction”, published by the World Meteorological Organisation (WMO), gives a comprehensive summary on how to analyse stream flow data focusing on low-flows. This packages provides functions to compute the described statistics and produces plots similar to the ones in the manual.
1493 Hydrological Data and Modeling LPM Linear Parametric Models Applied to Hydrological Series Apply Univariate Long Memory Models, Apply Multivariate Short Memory Models To Hydrological Dataset, Estimate Intensity Duration Frequency curve to rainfall series.
1494 Hydrological Data and Modeling lulcc Land Use Change Modelling in R Classes and methods for spatially explicit land use change modelling in R.
1495 Hydrological Data and Modeling MBC Multivariate Bias Correction of Climate Model Outputs Calibrate and apply multivariate bias correction algorithms for climate model simulations of multiple climate variables. Three methods described by Cannon (2016) <doi:10.1175/JCLI-D-15-0679.1> and Cannon (2018) <doi:10.1007/s00382-017-3580-6> are implemented: (i) MBC Pearson correlation (MBCp), (ii) MBC rank correlation (MBCr), and (iii) MBC N-dimensional PDF transform (MBCn).
1496 Hydrological Data and Modeling meteo Spatio-Temporal Analysis and Mapping of Meteorological Observations Spatio-temporal geostatistical mapping of meteorological data. Global spatio-temporal models calculated using publicly available data are stored in package.
1497 Hydrological Data and Modeling meteoland Landscape Meteorology Tools Functions to estimate weather variables at any position of a landscape [De Caceres et al. (2018) <doi:10.1016/j.envsoft.2018.08.003>].
1498 Hydrological Data and Modeling MODISTools Interface to the ‘MODIS Land Products Subsets’ Web Services Programmatic interface to the ‘MODIS Land Products Subsets’ web services (<https://modis.ornl.gov/data/modis_webservice.html>). Allows for easy downloads of ‘MODIS’ time series directly to your R workspace or your computer.
1499 Hydrological Data and Modeling MODIStsp A Tool for Automating Download and Preprocessing of MODIS Land Products Data Allows automating the creation of time series of rasters derived from MODIS Satellite Land Products data. It performs several typical preprocessing steps such as download, mosaicking, reprojection and resize of data acquired on a specified time period. All processing parameters can be set using a user-friendly GUI. Users can select which layers of the original MODIS HDF files they want to process, which additional Quality Indicators should be extracted from aggregated MODIS Quality Assurance layers and, in the case of Surface Reflectance products , which Spectral Indexes should be computed from the original reflectance bands. For each output layer, outputs are saved as single-band raster files corresponding to each available acquisition date. Virtual files allowing access to the entire time series as a single file are also created. Command-line execution exploiting a previously saved processing options file is also possible, allowing to automatically update time series related to a MODIS product whenever a new image is available.
1500 Hydrological Data and Modeling musica Multiscale Climate Model Assessment Provides functions allowing for (1) easy aggregation of multivariate time series into custom time scales, (2) comparison of statistical summaries between different data sets at multiple time scales (e.g. observed and bias-corrected data), (3) comparison of relations between variables and/or different data sets at multiple time scales (e.g. correlation of precipitation and temperature in control and scenario simulation) and (4) transformation of time series at custom time scales.
1501 Hydrological Data and Modeling nasapower NASA POWER API Client Client for ‘NASA’ ‘POWER’ global meteorology, surface solar energy and climatology data ‘API’. ‘POWER’ (Prediction Of Worldwide Energy Resource) data are freely available global meteorology and surface solar energy climatology data for download with a resolution of 1/2 by 1/2 arc degree longitude and latitude and are funded through the ‘NASA’ Earth Science Directorate Applied Science Program. For more on the data themselves, a web-based data viewer and web access, please see <https://power.larc.nasa.gov/>.
1502 Hydrological Data and Modeling nhdR Tools for working with the National Hydrography Dataset Tools for working with the National Hydrography Dataset, with functions for querying, downloading, and networking both the NHD <https://www.usgs.gov/core-science-systems/ngp/national-hydrography> and NHDPlus <http://www.horizon-systems.com/nhdplus> datasets.
1503 Hydrological Data and Modeling nsRFA Non-Supervised Regional Frequency Analysis A collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the index-value method and, more precisely, helps the hydrologist to: (1) regionalize the index-value; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.
1504 Hydrological Data and Modeling openair Tools for the Analysis of Air Pollution Data Tools to analyse, interpret and understand air pollution data. Data are typically hourly time series and both monitoring data and dispersion model output can be analysed. Many functions can also be applied to other data, including meteorological and traffic data.
1505 Hydrological Data and Modeling qmap Statistical Transformations for Post-Processing Climate Model Output Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.
1506 Hydrological Data and Modeling rdwd Select and Download Climate Data from ‘DWD’ (German Weather Service) Handle climate data from the ‘DWD’ (‘Deutscher Wetterdienst’, see <https://www.dwd.de/EN/climate_environment/cdc/cdc.html> for more information). Choose files with ‘selectDWD()’, download and process data sets with ‘dataDWD()’ and ‘readDWD()’.
1507 Hydrological Data and Modeling reservoir Tools for Analysis, Design, and Operation of Water Supply Storages Measure single-storage water supply system performance using resilience, reliability, and vulnerability metrics; assess storage-yield-reliability relationships; determine no-fail storage with sequent peak analysis; optimize release decisions for water supply, hydropower, and multi-objective reservoirs using deterministic and stochastic dynamic programming; generate inflow replicates using parametric and non-parametric models; evaluate inflow persistence using the Hurst coefficient.
1508 Hydrological Data and Modeling RHMS Hydrologic Modelling System for R Users Hydrologic modelling system is an object oriented tool which enables R users to simulate and analyze hydrologic events. The package proposes functions and methods for construction, simulation, visualization, and calibration of hydrologic systems.
1509 Hydrological Data and Modeling RMAWGEN Multi-Site Auto-Regressive Weather GENerator S3 and S4 functions are implemented for spatial multi-site stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.
1510 Hydrological Data and Modeling RNCEP Obtain, Organize, and Visualize NCEP Weather Data Contains functions to retrieve, organize, and visualize weather data from the NCEP/NCAR Reanalysis (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html) and NCEP/DOE Reanalysis II (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html) datasets. Data are queried via the Internet and may be obtained for a specified spatial and temporal extent or interpolated to a point in space and time. We also provide functions to visualize these weather data on a map. There are also functions to simulate flight trajectories according to specified behavior using either NCEP wind data or data specified by the user.
1511 Hydrological Data and Modeling rnoaa ‘NOAA’ Weather Data from R Client for many ‘NOAA’ data sources including the ‘NCDC’ climate ‘API’ at <https://www.ncdc.noaa.gov/cdo-web/webservices/v2>, with functions for each of the ‘API’ ‘endpoints’: data, data categories, data sets, data types, locations, location categories, and stations. In addition, we have an interface for ‘NOAA’ sea ice data, the ‘NOAA’ severe weather inventory, ‘NOAA’ Historical Observing ‘Metadata’ Repository (‘HOMR’) data, ‘NOAA’ storm data via ‘IBTrACS’, tornado data via the ‘NOAA’ storm prediction center, and more.
1512 Hydrological Data and Modeling rnrfa UK National River Flow Archive Data from R Utility functions to retrieve data from the UK National River Flow Archive (<http://nrfa.ceh.ac.uk/>). The package contains R wrappers to the UK NRFA data temporary-API. There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information.
1513 Hydrological Data and Modeling rpdo Pacific Decadal Oscillation Index Data Monthly Pacific Decadal Oscillation (PDO) index values from January 1900 to present.
1514 Hydrological Data and Modeling RSAlgaeR Builds Empirical Remote Sensing Models of Water Quality Variables and Analyzes Long-Term Trends Assists in processing reflectance data, developing empirical models using stepwise regression and a generalized linear modeling approach, cross- validation, and analysis of trends in water quality conditions (specifically chl-a) and climate conditions using the Theil-Sen estimator.
1515 Hydrological Data and Modeling rsoi Import Various Northern and Southern Hemisphere Climate Indices Downloads Southern Oscillation Index, Oceanic Nino Index, North Pacific Gyre Oscillation data, North Atlantic Oscillation and Arctic Oscillation. Data sources are described in the README file.
1516 Hydrological Data and Modeling rtop Interpolation of Data with Variable Spatial Support Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.
1517 Hydrological Data and Modeling rwunderground R Interface to Weather Underground API Tools for getting historical weather information and forecasts from wunderground.com. Historical weather and forecast data includes, but is not limited to, temperature, humidity, windchill, wind speed, dew point, heat index. Additionally, the weather underground weather API also includes information on sunrise/sunset, tidal conditions, satellite/webcam imagery, weather alerts, hurricane alerts and historical high/low temperatures.
1518 Hydrological Data and Modeling SCI Standardized Climate Indices Such as SPI, SRI or SPEI Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).
1519 Hydrological Data and Modeling smapr Acquisition and Processing of NASA Soil Moisture Active-Passive (SMAP) Data Facilitates programmatic access to NASA Soil Moisture Active Passive (SMAP) data with R. It includes functions to search for, acquire, and extract SMAP data.
1520 Hydrological Data and Modeling soilwater Implementation of Parametric Formulas for Soil Water Retention or Conductivity Curve It implements parametric formulas of soil water retention or conductivity curve. At the moment, only Van Genuchten (for soil water retention curve) and Mualem (for hydraulic conductivity) were implemented. See reference (<http://en.wikipedia.org/wiki/Water_retention_curve>).
1521 Hydrological Data and Modeling somspace Spatial Analysis with Self-Organizing Maps Application of the Self-Organizing Maps technique for spatial classification of time series. The package uses spatial data, point or gridded, to create clusters with similar characteristics. The clusters can be further refined to a smaller number of regions by hierarchical clustering and their spatial dependencies can be presented as complex networks. Thus, meaningful maps can be created, representing the regional heterogeneity of a single variable. More information and an example of implementation can be found in Markonis and Strnad (2019).
1522 Hydrological Data and Modeling SPEI Calculation of the Standardised Precipitation-Evapotranspiration Index A set of functions for computing potential evapotranspiration and several widely used drought indices including the Standardized Precipitation-Evapotranspiration Index (SPEI).
1523 Hydrological Data and Modeling streamDepletr Estimate Streamflow Depletion Due to Groundwater Pumping Implementation of analytical models for estimating streamflow depletion due to groundwater pumping, and other related tools. Functions are broadly split into two groups: (1) analytical streamflow depletion models, which estimate streamflow depletion for a single stream reach resulting from groundwater pumping; and (2) depletion apportionment equations, which distribute estimated streamflow depletion among multiple stream reaches within a stream network. See Zipper et al. (2018) <doi:10.1029/2018WR022707> for more information on depletion apportionment equations and Zipper et al. (2019) <doi:10.31223/osf.io/uqbd7> for more information on analytical depletion functions, which combine analytical models and depletion apportionment equations.
1524 Hydrological Data and Modeling SWATmodel A multi-OS implementation of the TAMU SWAT model The Soil and Water Assessment Tool is a river basin or watershed scale model developed by Dr. Jeff Arnold for the USDA-ARS.
1525 Hydrological Data and Modeling swmmr R Interface for US EPA’s SWMM Functions to connect the widely used Storm Water Management Model (SWMM) of the United States Environmental Protection Agency (US EPA) <https://www.epa.gov/water-research/storm-water-management-model-swmm> to R with currently two main goals: (1) Run a SWMM simulation from R and (2) provide fast access to simulation results, i.e. SWMM’s binary ‘.out’-files. High performance is achieved with help of Rcpp. Additionally, reading SWMM’s ‘.inp’ and ‘.rpt’ files is supported to glance model structures and to get direct access to simulation summaries.
1526 Hydrological Data and Modeling tidyhydat Extract and Tidy Canadian ‘Hydrometric’ Data Provides functions to access historical and real-time national ‘hydrometric’ data from Water Survey of Canada data sources (<http://dd.weather.gc.ca/hydrometric/csv/> and <http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.
1527 Hydrological Data and Modeling topmodel Implementation of the Hydrological Model TOPMODEL in R Set of hydrological functions including an R implementation of the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode.
1528 Hydrological Data and Modeling TUWmodel Lumped/Semi-Distributed Hydrological Model for Education Purposes The model, developed at the Vienna University of Technology, is a lumped conceptual rainfall-runoff model, following the structure of the HBV model. The model can also be run in a semi-distributed fashion. The model runs on a daily or shorter time step and consists of a snow routine, a soil moisture routine and a flow routing routine. See Parajka, J., R. Merz, G. Bloeschl (2007) <doi:10.1002/hyp.6253> Uncertainty and multiple objective calibration in regional water balance modelling: case study in 320 Austrian catchments, Hydrological Processes, 21, 435-446.
1529 Hydrological Data and Modeling VICmodel The Variable Infiltration Capacity (VIC) Model The Variable Infiltration Capacity (VIC) model is a macroscale hydrologic model that solves full water and energy balances, originally developed by Xu Liang at the University of Washington (UW). The version of VIC source code used is of 5.0.1 on <https://github.com/UW-Hydro/VIC/>, see Hamman et al. (2018). Development and maintenance of the current official version of the VIC model at present is led by the UW Hydro (Computational Hydrology group) in the Department of Civil and Environmental Engineering at UW. VIC is a research model and in its various forms it has been applied to most of the major river basins around the world, as well as globally. If you make use of this model, please acknowledge the appropriate references listed in the help page of this package or on the references page <http://vic.readthedocs.io/en/master/Documentation/References/> of the VIC official documentation website. These should include Liang et al. (1994) plus any references relevant to the features you are using Reference: Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges (1994), A simple hydrologically based model of land surface water and energy fluxes for general circulation models, J. Geophys. Res., 99(D7), 14415-14428, <doi:10.1029/94JD00483>. Hamman et al. (2018) about VIC 5.0.1 also can be considered: Hamman, J. J., Nijssen, B., Bohn, T. J., Gergel, D. R., and Mao, Y. (2018), The Variable Infiltration Capacity model version 5 (VIC-5): infrastructure improvements for new applications and reproducibility, Geosci. Model Dev., 11, 3481-3496, <doi:10.5194/gmd-11-3481-2018>.
1530 Hydrological Data and Modeling washdata Urban Water and Sanitation Survey Dataset Urban water and sanitation survey dataset collected by Water and Sanitation for the Urban Poor (WSUP) with technical support from Valid International. These citywide surveys have been collecting data allowing water and sanitation service levels across the entire city to be characterised, while also allowing more detailed data to be collected in areas of the city of particular interest. These surveys are intended to generate useful information for others working in the water and sanitation sector. Current release version includes datasets collected from a survey conducted in Dhaka, Bangladesh in March 2017. This survey in Dhaka is one of a series of surveys to be conducted by WSUP in various cities in which they operate including Accra, Ghana; Nakuru, Kenya; Antananarivo, Madagascar; Maputo, Mozambique; and, Lusaka, Zambia. This package will be updated once the surveys in other cities are completed and datasets have been made available.
1531 Hydrological Data and Modeling wasim Visualisation and analysis of output files of the hydrological model WASIM Helpful tools for data processing and visualisation of results of the hydrological model WASIM-ETH.
1532 Hydrological Data and Modeling water Actual Evapotranspiration with Energy Balance Models Tools and functions to calculate actual Evapotranspiration using surface energy balance models.
1533 Hydrological Data and Modeling waterData Retrieval, Analysis, and Anomaly Calculation of Daily Hydrologic Time Series Data Imports U.S. Geological Survey (USGS) daily hydrologic data from USGS web services (see <https://waterservices.usgs.gov/> for more information), plots the data, addresses some common data problems, and calculates and plots anomalies.
1534 Hydrological Data and Modeling Watersheds Spatial Watershed Aggregation and Spatial Drainage Network Analysis Methods for watersheds aggregation and spatial drainage network analysis.
1535 Hydrological Data and Modeling weathercan Download Weather Data from the Environment and Climate Change Canada Website Provides means for downloading historical weather data from the Environment and Climate Change Canada website (<http://climate.weather.gc.ca/historical_data/search_historic_data_e.html>). Data can be downloaded from multiple stations and over large date ranges and automatically processed into a single dataset. Tools are also provided to identify stations either by name or proximity to a location.
1536 Hydrological Data and Modeling worldmet Import Surface Meteorological Data from NOAA Integrated Surface Database (ISD) Functions to import data from more than 30,000 surface meteorological sites around the world managed by the National Oceanic and Atmospheric Administration (NOAA) Integrated Surface Database (ISD, see <https://www.ncdc.noaa.gov/isd>).
1537 Hydrological Data and Modeling wql Exploring Water Quality Monitoring Data Functions to assist in the processing and exploration of data from environmental monitoring programs. The package name stands for “water quality” and reflects the original focus on time series data for physical and chemical properties of water, as well as the biota. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similar-frequency time series regardless of the subject matter.
1538 Hydrological Data and Modeling WRSS Water Resources System Simulator Water resources system simulator is a tool for simulation and analysis of large-scale water resources systems. ‘WRSS’ proposes functions and methods for construction, simulation and analysis of primary storage and hydropower water resources features (e.g. reservoirs, aquifers, and etc.) based on Standard Operating Policy (SOP).
1539 Hydrological Data and Modeling WRTDStidal Weighted Regression for Water Quality Evaluation in Tidal Waters An adaptation for estuaries (tidal waters) of weighted regression on time, discharge, and season to evaluate trends in water quality time series.
1540 Machine Learning & Statistical Learning ahaz Regularization for semiparametric additive hazards regression Computationally efficient procedures for regularized estimation with the semiparametric additive hazards regression model.
1541 Machine Learning & Statistical Learning arules Mining Association Rules and Frequent Itemsets Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. See Christian Borgelt (2012) <doi:10.1002/widm.1074>.
1542 Machine Learning & Statistical Learning BART Bayesian Additive Regression Trees Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09-AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.
1543 Machine Learning & Statistical Learning bartMachine Bayesian Additive Regression Trees An advanced implementation of Bayesian Additive Regression Trees with expanded features for data analysis and visualization.
1544 Machine Learning & Statistical Learning BayesTree Bayesian Additive Regression Trees This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).
1545 Machine Learning & Statistical Learning BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Mohammadi and Wit (2019) <doi:10.18637/jss.v089.i03>.
1546 Machine Learning & Statistical Learning biglasso Extending Lasso Model Fitting to Big Data Extend lasso and elastic-net model fitting for ultrahigh-dimensional, multi-gigabyte data sets that cannot be loaded into memory. It’s much more memory- and computation-efficient as compared to existing lasso-fitting packages like ‘glmnet’ and ‘ncvreg’, thus allowing for very powerful big data analysis even with an ordinary laptop.
1547 Machine Learning & Statistical Learning bmrm Bundle Methods for Regularized Risk Minimization Package Bundle methods for minimization of convex and non-convex risk under L1 or L2 regularization. Implements the algorithm proposed by Teo et al. (JMLR 2010) as well as the extension proposed by Do and Artieres (JMLR 2012). The package comes with lot of loss functions for machine learning which make it powerful for big data analysis. The applications includes: structured prediction, linear SVM, multi-class SVM, f-beta optimization, ROC optimization, ordinal regression, quantile regression, epsilon insensitive regression, least mean square, logistic regression, least absolute deviation regression (see package examples), etc… all with L1 and L2 regularization.
1548 Machine Learning & Statistical Learning Boruta Wrapper Algorithm for All Relevant Feature Selection An all relevant feature selection wrapper algorithm. It finds relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies (shadows).
1549 Machine Learning & Statistical Learning bst Gradient Boosting Functional gradient descent algorithm for a variety of convex and non-convex loss functions, for both classical and robust regression and classification problems. See Wang (2011) <doi:10.2202/1557-4679.1304>, Wang (2012) <doi:10.3414/ME11-02-0020>, Wang (2018) <doi:10.1080/10618600.2018.1424635>, Wang (2018) <doi:10.1214/18-EJS1404>.
1550 Machine Learning & Statistical Learning C50 C5.0 Decision Trees and Rule-Based Models C5.0 decision trees and rule-based models for pattern recognition that extend the work of Quinlan (1993, ISBN:1-55860-238-0).
1551 Machine Learning & Statistical Learning caret Classification and Regression Training Misc functions for training and plotting classification and regression models.
1552 Machine Learning & Statistical Learning CORElearn Classification, Regression and Feature Evaluation A suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the ‘ExplainPrediction’ package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.
1553 Machine Learning & Statistical Learning CoxBoost Cox models by likelihood based boosting for a single survival endpoint or competing risks This package provides routines for fitting Cox models by likelihood based boosting for a single endpoint or in presence of competing risks
1554 Machine Learning & Statistical Learning Cubist Rule- And Instance-Based Regression Modeling Regression modeling using rules with added instance-based corrections.
1555 Machine Learning & Statistical Learning deepnet deep learning toolkit in R Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
1556 Machine Learning & Statistical Learning e1071 (core) Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
1557 Machine Learning & Statistical Learning earth Multivariate Adaptive Regression Splines Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines” <doi:10.1214/aos/1176347963>. (The term “MARS” is trademarked and thus not used in the name of the package.)
1558 Machine Learning & Statistical Learning effects Effect Displays for Linear, Generalized Linear, and Other Models Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.
1559 Machine Learning & Statistical Learning elasticnet Elastic-Net for Sparse Estimation and Sparse PCA Provides functions for fitting the entire solution path of the Elastic-Net and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 2005-10.
1560 Machine Learning & Statistical Learning ElemStatLearn Data Sets, Functions and Examples from the Book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman Useful when reading the book above mentioned, in the documentation referred to as ‘the book’.
1561 Machine Learning & Statistical Learning evclass Evidential Distance-Based Classification Different evidential distance-based classifiers, which provide outputs in the form of Dempster-Shafer mass functions. The methods are: the evidential K-nearest neighbor rule and the evidential neural network.
1562 Machine Learning & Statistical Learning evtree Evolutionary Learning of Globally Optimal Trees Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. The ‘evtree’ package implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. CPU and memory-intensive tasks are fully computed in C++ while the ‘partykit’ package is leveraged to represent the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions.
1563 Machine Learning & Statistical Learning frbs Fuzzy Rule-Based Systems for Classification and Regression Tasks An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named ‘frbsPMML’, which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from ‘frbsPMML’. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.
1564 Machine Learning & Statistical Learning GAMBoost Generalized linear and additive models by likelihood based boosting This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized B-splines
1565 Machine Learning & Statistical Learning gamboostLSS Boosting Methods for ‘GAMLSS’ Boosting models for fitting generalized additive models for location, shape and scale (‘GAMLSS’) to potentially high dimensional data.
1566 Machine Learning & Statistical Learning gbm (core) Generalized Boosted Regression Models An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway.
1567 Machine Learning & Statistical Learning ggRandomForests Visually Exploring Random Forests Graphic elements for exploring Random Forests using the ‘randomForest’ or ‘randomForestSRC’ package for survival, regression and classification forests and ‘ggplot2’ package plotting.
1568 Machine Learning & Statistical Learning glmnet Lasso and Elastic-Net Regularized Generalized Linear Models Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model. Two recent additions are the multiple-response Gaussian, and the grouped multinomial regression. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the paper linked to via the URL below.
1569 Machine Learning & Statistical Learning glmpath L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model A path-following algorithm for L1 regularized generalized linear models and Cox proportional hazards model.
1570 Machine Learning & Statistical Learning GMMBoost Likelihood-based Boosting for Generalized mixed models Likelihood-based Boosting for Generalized mixed models
1571 Machine Learning & Statistical Learning gradDescent Gradient Descent for Regression Tasks An implementation of various learning algorithms based on Gradient Descent for dealing with regression tasks. The variants of gradient descent algorithm are : Mini-Batch Gradient Descent (MBGD), which is an optimization to use training data partially to reduce the computation load. Stochastic Gradient Descent (SGD), which is an optimization to use a random data in learning to reduce the computation load drastically. Stochastic Average Gradient (SAG), which is a SGD-based algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), which is an optimization to speed-up gradient descent learning. Accelerated Gradient Descent (AGD), which is an optimization to accelerate gradient descent learning. Adagrad, which is a gradient-descent-based algorithm that accumulate previous cost to do adaptive learning. Adadelta, which is a gradient-descent-based algorithm that use hessian approximation to do adaptive learning. RMSprop, which is a gradient-descent-based algorithm that combine Adagrad and Adadelta adaptive learning ability. Adam, which is a gradient-descent-based algorithm that mean and variance moment to do adaptive learning. Stochastic Variance Reduce Gradient (SVRG), which is an optimization SGD-based algorithm to accelerates the process toward converging by reducing the gradient. Semi Stochastic Gradient Descent (SSGD),which is a SGD-based algorithm that combine GD and SGD to accelerates the process toward converging by choosing one of the gradients at a time. Stochastic Recursive Gradient Algorithm (SARAH), which is an optimization algorithm similarly SVRG to accelerates the process toward converging by accumulated stochastic information. Stochastic Recursive Gradient Algorithm+ (SARAHPlus), which is a SARAH practical variant algorithm to accelerates the process toward converging provides a possibility of earlier termination.
1572 Machine Learning & Statistical Learning grf Generalized Random Forests A pluggable package for forest-based statistical estimation and inference. GRF currently provides methods for non-parametric least-squares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables).
1573 Machine Learning & Statistical Learning grplasso Fitting User-Specified Models with Group Lasso Penalty Fits user-specified (GLM-) models with group lasso penalty.
1574 Machine Learning & Statistical Learning grpreg Regularization Paths for Regression Models with Grouped Covariates Efficient algorithms for fitting the regularization path of linear regression, GLM, and Cox regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bi-level selection methods such as the group exponential lasso, the composite MCP, and the group bridge.
1575 Machine Learning & Statistical Learning h2o R Interface for ‘H2O’ R interface for ‘H2O’, the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).
1576 Machine Learning & Statistical Learning hda Heteroscedastic Discriminant Analysis Functions to perform dimensionality reduction for classification if the covariance matrices of the classes are unequal.
1577 Machine Learning & Statistical Learning hdi High-Dimensional Inference Implementation of multiple approaches to perform inference in high-dimensional models.
1578 Machine Learning & Statistical Learning hdm High-Dimensional Metrics Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/ structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <arXiv:1603.01700>.
1579 Machine Learning & Statistical Learning ICEbox Individual Conditional Expectation Plot Toolbox Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman’s partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist.
1580 Machine Learning & Statistical Learning ipred Improved Predictors Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
1581 Machine Learning & Statistical Learning kernlab (core) Kernel-Based Machine Learning Lab Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
1582 Machine Learning & Statistical Learning klaR Classification and Visualization Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to ‘svmlight’ and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.
1583 Machine Learning & Statistical Learning lars Least Angle Regression, Lasso and Forward Stagewise Efficient procedures for fitting an entire lasso sequence with the cost of a single least squares fit. Least angle regression and infinitesimal forward stagewise regression are related to the lasso, as described in the paper below.
1584 Machine Learning & Statistical Learning lasso2 L1 Constrained Estimation aka ‘lasso’ Routines and documentation for solving regression problems while imposing an L1 constraint on the estimates, based on the algorithm of Osborne et al. (1998).
1585 Machine Learning & Statistical Learning LiblineaR Linear Predictive Models Based on the ‘LIBLINEAR’ C/C++ Library A wrapper around the ‘LIBLINEAR’ C/C++ library for machine learning (available at <http://www.csie.ntu.edu.tw/~cjlin/liblinear>). ‘LIBLINEAR’ is a simple library for solving large-scale regularized linear classification and regression. It currently supports L2-regularized classification (such as logistic regression, L2-loss linear SVM and L1-loss linear SVM) as well as L1-regularized classification (such as L2-loss linear SVM and logistic regression) and L2-regularized support vector regression (with L1- or L2-loss). The main features of LiblineaR include multi-class classification (one-vs-the rest, and Crammer & Singer method), cross validation for model selection, probability estimates (logistic regression only) or weights for unbalanced data. The estimation of the models is particularly fast as compared to other libraries.
1586 Machine Learning & Statistical Learning LogicReg Logic Regression Routines for fitting Logic Regression models. Logic Regression is described in Ruczinski, Kooperberg, and LeBlanc (2003) <doi:10.1198/1061860032238>. Monte Carlo Logic Regression is described in and Kooperberg and Ruczinski (2005) <doi:10.1002/gepi.20042>.
1587 Machine Learning & Statistical Learning LTRCtrees Survival Trees to Fit Left-Truncated and Right-Censored and Interval-Censored Survival Data Recursive partition algorithms designed for fitting survival tree with left-truncated and right censored (LTRC) data, as well as interval-censored data. The LTRC trees can also be used to fit survival tree with time-varying covariates.
1588 Machine Learning & Statistical Learning maptree Mapping, pruning, and graphing tree models Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.
1589 Machine Learning & Statistical Learning mboost (core) Model-Based Boosting Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
1590 Machine Learning & Statistical Learning mlr Machine Learning in R Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, example-specific cost-sensitive learning. Generic resampling, including cross-validation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single- and multi-objective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized.
1591 Machine Learning & Statistical Learning model4you Stratified and Personalised Models Based on Model-Based Trees and Forests Model-based trees for subgroup analyses in clinical trials and model-based forests for the estimation and prediction of personalised treatment effects (personalised models). Currently partitioning of linear models, lm(), generalised linear models, glm(), and Weibull models, survreg(), is supported. Advanced plotting functionality is supported for the trees and a test for parameter heterogeneity is provided for the personalised models. For details on model-based trees for subgroup analyses see Seibold, Zeileis and Hothorn (2016) <doi:10.1515/ijb-2015-0032>; for details on model-based forests for estimation of individual treatment effects see Seibold, Zeileis and Hothorn (2017) <doi:10.1177/0962280217693034>.
1592 Machine Learning & Statistical Learning MXM Feature Selection (Including Multiple Solutions) and Bayesian Networks Many feature selection methods for a wide range of response variables, including minimal, statistically-equivalent and equally-predictive feature subsets. Bayesian network algorithms and related functions are also included. The package name ‘MXM’ stands for “Mens eX Machina”, meaning “Mind from the Machine” in Latin. References: a) Lagani, V. and Athineou, G. and Farcomeni, A. and Tsagris, M. and Tsamardinos, I. (2017). Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets. Journal of Statistical Software, 80(7). <doi:10.18637/jss.v080.i07>. b) Tsagris, M., Lagani, V. and Tsamardinos, I. (2018). Feature selection for high-dimensional temporal data. BMC Bioinformatics, 19:17. <doi:10.1186/s12859-018-2023-7>. c) Tsagris, M., Borboudakis, G., Lagani, V. and Tsamardinos, I. (2018). Constraint-based causal discovery with mixed data. International Journal of Data Science and Analytics, 6(1): 19-30. <doi:10.1007/s41060-018-0097-y>. d) Tsagris, M., Papadovasilakis, Z., Lakiotaki, K. and Tsamardinos, I. (2018). Efficient feature selection on gene expression data: Which algorithm to use? BioRxiv. <doi:10.1101/431734>. e) Tsagris, M. (2019). Bayesian Network Learning with the PC Algorithm: An Improved and Correct Variation. Applied Artificial Intelligence, 33(2):101-123. <doi:10.1080/08839514.2018.1526760>. f) Borboudakis, G. and Tsamardinos, I. (2019). Forward-Backward Selection with Early Dropping. Journal of Machine Learning Research 20: 1-39.
1593 Machine Learning & Statistical Learning naivebayes High Performance Implementation of the Naive Bayes Algorithm In this implementation of the Naive Bayes classifier following class conditional distributions are available: Bernoulli, Categorical, Gaussian, Poisson and non-parametric representation of the class conditional density estimated via Kernel Density Estimation.
1594 Machine Learning & Statistical Learning ncvreg Regularization Paths for SCAD and MCP Penalized Regression Models Fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the “elastic net” idea). Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, inference, and prediction are also provided.
1595 Machine Learning & Statistical Learning nnet (core) Feed-Forward Neural Networks and Multinomial Log-Linear Models Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.
1596 Machine Learning & Statistical Learning oem Orthogonalizing EM: Penalized Regression for Big Tall Data Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem() and the functions cv.oem() and xval.oem() are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem() function allows for out of memory fitting.
1597 Machine Learning & Statistical Learning OneR One Rule Machine Learning Classification Algorithm with Enhancements Implements the One Rule (OneR) Machine Learning classification algorithm (Holte, R.C. (1993) <doi:10.1023/A:1022631118932>) with enhancements for sophisticated handling of numeric data and missing values together with extensive diagnostic functions. It is useful as a baseline for machine learning models and the rules are often helpful heuristics.
1598 Machine Learning & Statistical Learning opusminer OPUS Miner Algorithm for Filtered Top-k Association Discovery Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the top-k productive, non-redundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of self-sufficient itemsets, using either leverage or lift. See <http://i.giwebb.com/index.php/research/association-discovery/> for more information in relation to the OPUS Miner algorithm.
1599 Machine Learning & Statistical Learning pamr Pam: Prediction Analysis for Microarrays Some functions for sample classification in microarrays.
1600 Machine Learning & Statistical Learning party A Laboratory for Recursive Partytioning A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/1471-2105-8-25>.
1601 Machine Learning & Statistical Learning partykit A Toolkit for Recursive Partytioning A toolkit with infrastructure for representing, summarizing, and visualizing tree-structured regression and classification models. This unified infrastructure can be used for reading/coercing tree models from different sources (‘rpart’, ‘RWeka’, ‘PMML’) yielding objects that share functionality for print()/plot()/predict() methods. Furthermore, new and improved reimplementations of conditional inference trees (ctree()) and model-based recursive partitioning (mob()) from the ‘party’ package are provided based on the new infrastructure. A description of this package was published by Hothorn and Zeileis (2015) <http://jmlr.org/papers/v16/hothorn15a.html>.
1602 Machine Learning & Statistical Learning pdp Partial Dependence Plots A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.
1603 Machine Learning & Statistical Learning penalized L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Cross-validation routines allow optimization of the tuning parameters.
1604 Machine Learning & Statistical Learning penalizedLDA Penalized Classification using Fisher’s Linear Discriminant Implements the penalized LDA proposal of “Witten and Tibshirani (2011), Penalized classification using Fisher’s linear discriminant, to appear in Journal of the Royal Statistical Society, Series B”.
1605 Machine Learning & Statistical Learning picasso Pathwise Calibrated Sparse Shooting Algorithm Computationally efficient tools for fitting generalized linear model with convex or non-convex penalty. Users can enjoy the superior statistical property of non-convex penalty such as SCAD and MCP which has significantly less estimation error and overfitting compared to convex penalty such as lasso and ridge. Computation is handled by multi-stage convex relaxation and the PathwIse CAlibrated Sparse Shooting algOrithm (PICASSO) which exploits warm start initialization, active set updating, and strong rule for coordinate preselection to boost computation, and attains a linear convergence to a unique sparse local optimum with optimal statistical properties. The computation is memory-optimized using the sparse matrix output.
1606 Machine Learning & Statistical Learning plotmo Plot a Model’s Residuals, Response, and Partial Dependence Plots Plot model surfaces for a wide variety of models using partial dependence plots and other techniques. Also plot model residuals and other information on the model.
1607 Machine Learning & Statistical Learning quantregForest Quantile Regression Forests Quantile Regression Forests is a tree-based ensemble method for estimation of conditional quantiles. It is particularly well suited for high-dimensional data. Predictor variables of mixed classes can be handled. The package is dependent on the package ‘randomForest’, written by Andy Liaw.
1608 Machine Learning & Statistical Learning randomForest (core) Breiman and Cutler’s Random Forests for Classification and Regression Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <doi:10.1023/A:1010933404324>.
1609 Machine Learning & Statistical Learning randomForestSRC Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) Fast OpenMP parallel computing of Breiman’s random forests for survival, competing risks, regression and classification based on Ishwaran and Kogalur’s popular random survival forests (RSF) package. Handles missing data and now includes multivariate, unsupervised forests, quantile regression and solutions for class imbalanced data. New fast interface using subsampling and confidence regions for variable importance.
1610 Machine Learning & Statistical Learning ranger A Fast Implementation of Random Forests A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class ‘gwaa.data’ (R package ‘GenABEL’) and ‘dgCMatrix’ (R package ‘Matrix’) can be directly analyzed.
1611 Machine Learning & Statistical Learning rattle Graphical User Interface for Data Science in R The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself.
1612 Machine Learning & Statistical Learning Rborist Extensible, Parallelizable Implementation of the Random Forest Algorithm Scalable implementation of classification and regression forests, as described by Breiman (2001), <doi:10.1023/A:1010933404324>.
1613 Machine Learning & Statistical Learning RcppDL Deep Learning Methods via Rcpp This package is based on the C++ code from Yusuke Sugomori, which implements basic machine learning methods with many layers (deep learning), including dA (Denoising Autoencoder), SdA (Stacked Denoising Autoencoder), RBM (Restricted Boltzmann machine) and DBN (Deep Belief Nets).
1614 Machine Learning & Statistical Learning rdetools Relevant Dimension Estimation (RDE) in Feature Spaces The package provides functions for estimating the relevant dimension of a data set in feature spaces, applications to model selection, graphical illustrations and prediction.
1615 Machine Learning & Statistical Learning REEMtree Regression Trees with Random Effects for Longitudinal (Panel) Data This package estimates regression trees with random effects as a way to use data mining techniques to describe longitudinal or panel data.
1616 Machine Learning & Statistical Learning relaxo Relaxed Lasso Relaxed Lasso is a generalisation of the Lasso shrinkage technique for linear regression. Both variable selection and parameter estimation is achieved by regular Lasso, yet both steps do not necessarily use the same penalty parameter. The results include all standard Lasso solutions but allow often for sparser models while having similar or even slightly better predictive performance if many predictor variables are present. The package depends on the LARS package.
1617 Machine Learning & Statistical Learning rgenoud R Version of GENetic Optimization Using Derivatives A genetic algorithm plus derivative optimizer.
1618 Machine Learning & Statistical Learning RGF Regularized Greedy Forest Regularized Greedy Forest wrapper of the ‘Regularized Greedy Forest’ <https://github.com/RGF-team/rgf/tree/master/python-package> ‘python’ package, which also includes a Multi-core implementation (FastRGF) <https://github.com/RGF-team/rgf/tree/master/FastRGF>.
1619 Machine Learning & Statistical Learning RLT Reinforcement Learning Trees Random forest with a variety of additional features for regression, classification and survival analysis. The features include: parallel computing with OpenMP, embedded model for selecting the splitting variable (based on Zhu, Zeng & Kosorok, 2015), subject weight, variable weight, tracking subjects used in each tree, etc.
1620 Machine Learning & Statistical Learning Rmalschains Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R An implementation of an algorithm family for continuous optimization called memetic algorithms with local search chains (MA-LS-Chains). Memetic algorithms are hybridizations of genetic algorithms with local search methods. They are especially suited for continuous optimization.
1621 Machine Learning & Statistical Learning rminer Data Mining Classification and Regression Methods Facilitates the use of data mining algorithms in classification and regression (including time series forecasting) tasks by presenting a short and coherent set of functions. Versions: 1.4.2 new NMAE metric, “xgboost” and “cv.glmnet” models (16 classification and 18 regression models); 1.4.1 new tutorial and more robust version; 1.4 - new classification and regression models/algorithms, with a total of 14 classification and 15 regression methods, including: Decision Trees, Neural Networks, Support Vector Machines, Random Forests, Bagging and Boosting; 1.3 and 1.3.1 - new classification and regression metrics (improved mmetric function); 1.2 - new input importance methods (improved Importance function); 1.0 - first version.
1622 Machine Learning & Statistical Learning ROCR Visualizing the Performance of Scoring Classifiers ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.
1623 Machine Learning & Statistical Learning RoughSets Data Analysis Using Rough Set and Fuzzy Rough Set Theories Implementations of algorithms for data analysis based on the rough set theory (RST) and the fuzzy rough set theory (FRST). We not only provide implementations for the basic concepts of RST and FRST but also popular algorithms that derive from those theories. The methods included in the package can be divided into several categories based on their functionality: discretization, feature selection, instance selection, rule induction and classification based on nearest neighbors. RST was introduced by Zdzisaw Pawlak in 1982 as a sophisticated mathematical tool to model and process imprecise or incomplete information. By using the indiscernibility relation for objects/instances, RST does not require additional parameters to analyze the data. FRST is an extension of RST. The FRST combines concepts of vagueness and indiscernibility that are expressed with fuzzy sets (as proposed by Zadeh, in 1965) and RST.
1624 Machine Learning & Statistical Learning rpart (core) Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
1625 Machine Learning & Statistical Learning RPMM Recursively Partitioned Mixture Model Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.
1626 Machine Learning & Statistical Learning RSNNS Neural Networks using the Stuttgart Neural Network Simulator (SNNS) The Stuttgart Neural Network Simulator (SNNS) is a library containing many standard implementations of neural networks. This package wraps the SNNS functionality to make it available from within R. Using the ‘RSNNS’ low-level interface, all of the algorithmic functionality and flexibility of SNNS can be accessed. Furthermore, the package contains a convenient high-level interface, so that the most common neural network topologies and learning algorithms integrate seamlessly into R.
1627 Machine Learning & Statistical Learning RWeka R/Weka Interface An R interface to Weka (Version 3.9.3). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Package ‘RWeka’ contains the interface code, the Weka jar is in a separate package ‘RWekajars’. For more information on Weka see <http://www.cs.waikato.ac.nz/ml/weka/>.
1628 Machine Learning & Statistical Learning RXshrink Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression Identify and display TRACEs for a specified shrinkage path and determine the extent of shrinkage most likely, under normal distribution theory, to produce an optimal reduction in MSE Risk in estimates of regression (beta) coefficients. Alternative estimates are also provided when ill-conditioned (nearly multicollinear) models yield OLS estimates with “wrong” numerical signs.
1629 Machine Learning & Statistical Learning sda Shrinkage Discriminant Analysis and CAT Score Variable Selection Provides an efficient framework for high-dimensional linear and diagonal discriminant analysis with variable selection. The classifier is trained using James-Stein-type shrinkage estimators and predictor variables are ranked using correlation-adjusted t-scores (CAT scores). Variable selection error is controlled using false non-discovery rates or higher criticism.
1630 Machine Learning & Statistical Learning SIS Sure Independence Screening Variable selection techniques are essential tools for model selection and estimation in high-dimensional statistical models. Through this publicly available package, we provide a unified environment to carry out variable selection using iterative sure independence screening (SIS) and all of its variants in generalized linear models and the Cox proportional hazards model.
1631 Machine Learning & Statistical Learning ssgraph Bayesian Graphical Estimation using Spike-and-Slab Priors Bayesian estimation for undirected graphical models using spike-and-slab priors. The package handles continuous, discrete, and mixed data.
1632 Machine Learning & Statistical Learning stabs Stability Selection with Error Control Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.
1633 Machine Learning & Statistical Learning SuperLearner Super Learner Prediction Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.
1634 Machine Learning & Statistical Learning svmpath The SVM Path Algorithm Computes the entire regularization path for the two-class svm classifier with essentially the same cost as a single SVM fit.
1635 Machine Learning & Statistical Learning tensorflow R Interface to ‘TensorFlow’ Interface to ‘TensorFlow’ <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
1636 Machine Learning & Statistical Learning tgp Bayesian Treed Gaussian Process Models Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions.
1637 Machine Learning & Statistical Learning tree Classification and Regression Trees Classification and regression trees.
1638 Machine Learning & Statistical Learning trtf Transformation Trees and Forests Recursive partytioning of transformation models with corresponding random forest for conditional transformation models as described in ‘Transformation Forests’ (Hothorn and Zeileis, 2017, <arXiv:1701.02110>) and ‘Top-Down Transformation Choice’ (Hothorn, 2018, <doi:10.1177/1471082X17748081>).
1639 Machine Learning & Statistical Learning varSelRF V