Content

  • 『CRAN Task Views』のカテゴリーに基づいてRパッケージを整理しています。
  • Source: CRAN Task Views https://cran.r-project.org/web/views/ , https://cran.ism.ac.jp/web/views/
  • “Category”, “Package”, “Title” and “Description” are quoted and copied from the above sources./“Category”、 “Package”、 “Title”そして“Description”は上記ソースからコピーしています。
  • 複数のキーワードで検索する際は、半角スペースで区切って下さい
  • 検索には正規表現が利用できます
Category Package Title Description
1 Bayesian Inference abc Tools for Approximate Bayesian Computation (ABC) Implements several ABC algorithms for performing parameter estimation, model selection, and goodness-of-fit. Cross-validation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models.
2 Bayesian Inference abn Modelling Multivariate Data with Additive Bayesian Networks Bayesian network analysis is a form of probabilistic graphical models which derives from empirical data a directed acyclic graph, DAG, describing the dependency structure between random variables. An additive Bayesian network model consists of a form of a DAG where each node comprises a generalized linear model, GLM. Additive Bayesian network models are equivalent to Bayesian multivariate regression using graphical modelling, they generalises the usual multivariable regression, GLM, to multiple dependent variables. ‘abn’ provides routines to help determine optimal Bayesian network models for a given data set, where these models are used to identify statistical dependencies in messy, complex data. The additive formulation of these models is equivalent to multivariate generalised linear modelling (including mixed models with iid random effects). The usual term to describe this model selection process is structure discovery. The core functionality is concerned with model selection - determining the most robust empirical model of data from interdependent variables. Laplace approximations are used to estimate goodness of fit metrics and model parameters, and wrappers are also included to the INLA package which can be obtained from <http://www.r-inla.org>. A comprehensive set of documented case studies, numerical accuracy/quality assurance exercises, and additional documentation are available from the ‘abn’ website.
3 Bayesian Inference AdMit Adaptive Mixture of Student-t Distributions Provides functions to perform the fitting of an adaptive mixture of Student-t distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the Metropolis-Hastings algorithm to obtain quantities of interest for the target density itself.
4 Bayesian Inference arm (core) Data Analysis Using Regression and Multilevel/Hierarchical Models Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.
5 Bayesian Inference AtelieR A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (Central-Limit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).
6 Bayesian Inference BaBooN Bayesian Bootstrap Predictive Mean Matching - Multiple and Single Imputation for Discrete Data Included are two variants of Bayesian Bootstrap Predictive Mean Matching to multiply impute missing data. The first variant is a variable-by-variable imputation combining sequential regression and Predictive Mean Matching (PMM) that has been extended for unordered categorical data. The Bayesian Bootstrap allows for generating approximately proper multiple imputations. The second variant is also based on PMM, but the focus is on imputing several variables at the same time. The suggestion is to use this variant, if the missing-data pattern resembles a data fusion situation, or any other missing-by-design pattern, where several variables have identical missing-data patterns. Both variants can be run as ‘single imputation’ versions, in case the analysis objective is of a purely descriptive nature.
7 Bayesian Inference BACCO (core) Bayesian Analysis of Computer Code Output (BACCO) The BACCO bundle of packages is replaced by the BACCO package, which provides a vignette that illustrates the constituent packages (emulator, approximator, calibrator) in use.
8 Bayesian Inference BaM Functions and Datasets for Books by Jeff Gill Functions and datasets for Jeff Gill: “Bayesian Methods: A Social and Behavioral Sciences Approach”. First, Second, and Third Edition. Published by Chapman and Hall/CRC (2002, 2007, 2014).
9 Bayesian Inference bamlss Bayesian Additive Models for Location Scale and Shape (and Beyond) Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2017) <doi:10.1080/10618600.2017.1407325>.
10 Bayesian Inference BART Bayesian Additive Regression Trees Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09-AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.
11 Bayesian Inference BAS Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the mixture of g-priors from Liang et al (2008) <doi:10.1198/016214507000001337> for linear models or mixtures of g-priors in GLMs of Li and Clyde (2018) <arXiv:1503.06913>. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an efficient MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Uniform priors over all models or beta-binomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used. The user may force variables to always be included. Details behind the sampling algorithm are provided in Clyde, Ghosh and Littman (2010) <doi:10.1198/jcgs.2010.09049>. This material is based upon work supported by the National Science Foundation under Grant DMS-1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
12 Bayesian Inference BayesDA Functions and Datasets for the book “Bayesian Data Analysis” Functions for Bayesian Data Analysis, with datasets from the book “Bayesian data Analysis (second edition)” by Gelman, Carlin, Stern and Rubin. Not all datasets yet, hopefully completed soon.
13 Bayesian Inference BayesFactor Computation of Bayes Factors for Common Designs A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.
14 Bayesian Inference bayesGARCH Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) <doi:10.1007/978-3-540-78657-3>.
15 Bayesian Inference bayesImageS Bayesian Methods for Image Segmentation using a Potts Model Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior of Moores et al. (2015) <doi:10.1016/j.csda.2014.12.001>. Latent labels are sampled using chequerboard updating or Swendsen-Wang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, approximate Bayesian computation (ABC-MCMC and ABC-SMC), and the parametric functional approximate Bayesian (PFAB) algorithm. Refer to <doi:10.1007/s11222-014-9525-6> and <doi:10.1214/18-BA1130> for further details.
16 Bayesian Inference bayesm (core) Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
17 Bayesian Inference bayesmeta Bayesian Random-Effects Meta-Analysis A collection of functions allowing to derive the posterior distribution of the two parameters in a random-effects meta-analysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive p-values, etc.
18 Bayesian Inference bayesmix Bayesian Mixture Models with JAGS The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.
19 Bayesian Inference bayesQR Bayesian Quantile Regression Bayesian quantile regression using the asymmetric Laplace distribution, both continuous as well as binary dependent variables are supported. The package consists of implementations of the methods of Yu & Moyeed (2001) <doi:10.1016/S0167-7152(01)00124-9>, Benoit & Van den Poel (2012) <doi:10.1002/jae.1216> and Al-Hamzawi, Yu & Benoit (2012) <doi:10.1177/1471082X1101200304>. To speed up the calculations, the Markov Chain Monte Carlo core of all algorithms is programmed in Fortran and called from R.
20 Bayesian Inference BayesSummaryStatLM MCMC Sampling of Bayesian Linear Models via Summary Statistics Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function read.regress.data.ff utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks.
21 Bayesian Inference bayesSurv (core) Bayesian Survival Regression with Flexible Error and Random Effects Distributions Contains Bayesian implementations of Mixed-Effects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only right-censored but also interval-censored, doubly-interval-censored or misclassified interval-censored.
22 Bayesian Inference BayesTree Bayesian Additive Regression Trees This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).
23 Bayesian Inference BayesValidate BayesValidate Package BayesValidate implements the software validation method described in the paper “Validation of Software for Bayesian Models using Posterior Quantiles” (Cook, Gelman, and Rubin, 2005). It inputs a function to perform Bayesian inference as well as functions to generate data from the Bayesian model being fit, and repeatedly generates and analyzes data to check that the Bayesian inference program works properly.
24 Bayesian Inference BayesVarSel Bayes Factors, Model Choice and Variable Selection in Linear Models Conceived to calculate Bayes factors in linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in this package is placed on the prior distributions and it allows a wide range of them: Jeffreys (1961); Zellner and Siow(1980)<doi:10.1007/bf02888369>; Zellner and Siow(1984); Zellner (1986)<doi:10.2307/2233941>; Fernandez et al. (2001)<doi:10.1016/s0304-4076(00)00076-2>; Liang et al. (2008)<doi:10.1198/016214507000001337> and Bayarri et al. (2012)<doi:10.1214/12-aos1013>. The interaction with the package is through a friendly interface that syntactically mimics the well-known lm() command of R. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true -data generating- model. Additionally, this package incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions of the main commands, Garcia-Donato and Martinez-Beneito (2013)<doi:10.1080/01621459.2012.742443>.
25 Bayesian Inference BayesX R Utilities Accompanying the Software Package BayesX Functions for exploring and visualising estimation results obtained with BayesX, a free software for estimating structured additive regression models (<http://www.BayesX.org>). In addition, functions that allow to read, write and manipulate map objects that are required in spatial analyses performed with BayesX.
26 Bayesian Inference BayHaz R Functions for Bayesian Hazard Rate Estimation A suite of R functions for Bayesian estimation of smooth hazard rates via Compound Poisson Process (CPP) and Bayesian Penalized Spline (BPS) priors.
27 Bayesian Inference BAYSTAR On Bayesian analysis of Threshold autoregressive model (BAYSTAR) The manuscript introduces the BAYSTAR package, which provides the functionality for Bayesian estimation in autoregressive threshold models.
28 Bayesian Inference bbemkr Bayesian bandwidth estimation for multivariate kernel regression with Gaussian error Bayesian bandwidth estimation for Nadaraya-Watson type multivariate kernel regression with Gaussian error density
29 Bayesian Inference BCBCSF Bias-Corrected Bayesian Classification with Selected Features Fully Bayesian Classification with a subset of high-dimensional features, such as expression levels of genes. The data are modeled with a hierarchical Bayesian models using heavy-tailed t distributions as priors. When a large number of features are available, one may like to select only a subset of features to use, typically those features strongly correlated with the response in training cases. Such a feature selection procedure is however invalid since the relationship between the response and the features has be exaggerated by feature selection. This package provides a way to avoid this bias and yield better-calibrated predictions for future cases when one uses F-statistic to select features.
30 Bayesian Inference BCE Bayesian composition estimator: estimating sample (taxonomic) composition from biomarker data Function to estimates taxonomic compositions from biomarker data, using a Bayesian approach.
31 Bayesian Inference bclust Bayesian Hierarchical Clustering Using Spike and Slab Models Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.
32 Bayesian Inference bcp Bayesian Analysis of Change Point Problems Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.
33 Bayesian Inference BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Letac et al. (2018) <arXiv:1706.04416>, Dobra and Mohammadi (2018) <doi:10.1214/18-AOAS1164>, Mohammadi et al. (2017) <doi:10.1111/rssc.12171>. To speed up the computations, the BDMCMC sampling algorithms are implemented in parallel using OpenMP in C++.
34 Bayesian Inference BLR Bayesian Linear Regression Bayesian Linear Regression.
35 Bayesian Inference BMA Bayesian Model Averaging Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).
36 Bayesian Inference Bmix Bayesian Sampling for Stick-Breaking Mixtures This is a bare-bones implementation of sampling algorithms for a variety of Bayesian stick-breaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stick-breaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stick-breaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.
37 Bayesian Inference bmixture Bayesian Estimation for Finite Mixture of Distributions Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s00180-012-0323-3> and Mohammadi and Salehi-Rad (2012) <doi:10.1080/03610918.2011.588358>.
38 Bayesian Inference BMS Bayesian Model Averaging Library Bayesian model averaging for linear models with a wide choice of (customizable) priors. Built-in priors include coefficient priors (fixed, flexible and hyper-g priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Post-processing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.
39 Bayesian Inference bnlearn Bayesian Network Structure Learning, Parameter Learning and Inference Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC and RSMAX2) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries and cross-validation. Development snapshots with the latest bugfixes are available from <http://www.bnlearn.com>.
40 Bayesian Inference boa (core) Bayesian Output Analysis Program (BOA) for MCMC A menu-driven program and library of functions for carrying out convergence diagnostics and statistical and graphical analysis of Markov chain Monte Carlo sampling output.
41 Bayesian Inference Bolstad Functions for Elementary Bayesian Inference A set of R functions and data sets for the book Introduction to Bayesian Statistics, Bolstad, W.M. (2017), John Wiley & Sons ISBN 978-1-118-09156-2.
42 Bayesian Inference Boom Bayesian Object Oriented Modeling A C++ library for Bayesian modeling, with an emphasis on Markov chain Monte Carlo. Although boom contains a few R utilities (mainly plotting functions), its primary purpose is to install the BOOM C++ library on your system so that other packages can link against it.
43 Bayesian Inference BoomSpikeSlab MCMC for Spike and Slab Regression Spike and slab regression a la McCulloch and George (1997).
44 Bayesian Inference bqtl Bayesian QTL Mapping Toolkit QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.
45 Bayesian Inference bridgesampling Bridge Sampling for Marginal Likelihoods and Bayes Factors Provides functions for estimating marginal likelihoods, Bayes factors, posterior model probabilities, and normalizing constants in general, via different versions of bridge sampling (Meng & Wong, 1996, <http://www3.stat.sinica.edu.tw/statistica/j6n4/j6n43/j6n43.htm>).
46 Bayesian Inference brms Bayesian Regression Models using ‘Stan’ Fit Bayesian generalized (non-)linear multivariate multilevel models using ‘Stan’ for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit among others linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation. References: Burkner (2017) <doi:10.18637/jss.v080.i01>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.
47 Bayesian Inference bsamGP Bayesian Spectral Analysis Models using Gaussian Process Priors Contains functions to perform Bayesian inference using a spectral analysis of Gaussian process priors. Gaussian processes are represented with a Fourier series based on cosine basis functions. Currently the package includes parametric linear models, partial linear additive models with/without shape restrictions, generalized linear additive models with/without shape restrictions, and density estimation model. To maximize computational efficiency, the actual Markov chain Monte Carlo sampling for each model is done using codes written in FORTRAN 90. This software has been developed using funding supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (no. NRF-2016R1D1A1B03932178 and no. NRF-2017R1D1A3B03035235).
48 Bayesian Inference bspec Bayesian Spectral Inference Bayesian inference on the (discrete) power spectrum of time series.
49 Bayesian Inference bspmma Bayesian Semiparametric Models for Meta-Analysis The main functions carry out Gibbs’ sampler routines for nonparametric and semiparametric Bayesian models for random effects meta-analysis.
50 Bayesian Inference BSquare Bayesian Simultaneous Quantile Regression This package models the quantile process as a function of predictors.
51 Bayesian Inference bsts Bayesian Structural Time Series Time series regression using dynamic linear models fit using MCMC. See Scott and Varian (2014) <doi:10.1504/IJMMNO.2014.059942>, among many other sources.
52 Bayesian Inference BVS Bayesian Variant Selection: Bayesian Model Uncertainty Techniques for Genetic Association Studies The functions in this package focus on analyzing case-control association studies involving a group of genetic variants. In particular, we are interested in modeling the outcome variable as a function of a multivariate genetic profile using Bayesian model uncertainty and variable selection techniques. The package incorporates functions to analyze data sets involving common variants as well as extensions to model rare variants via the Bayesian Risk Index (BRI) as well as haplotypes. Finally, the package also allows the incorporation of external biological information to inform the marginal inclusion probabilities via the iBMU.
53 Bayesian Inference catnet Categorical Bayesian Network Inference Structure learning and parameter estimation of discrete Bayesian networks using likelihood-based criteria. Exhaustive search for fixed node orders and stochastic search of optimal orders via simulated annealing algorithm are implemented.
54 Bayesian Inference coalescentMCMC MCMC Algorithms for the Coalescent Flexible framework for coalescent analyses in R. It includes a main function running the MCMC algorithm, auxiliary functions for tree rearrangement, and some functions to compute population genetic parameters.
55 Bayesian Inference coda (core) Output Analysis and Diagnostics for MCMC Provides functions for summarizing and plotting the output from Markov Chain Monte Carlo (MCMC) simulations, as well as diagnostic tests of convergence to the equilibrium distribution of the Markov chain.
56 Bayesian Inference dclone Data Cloning and MCMC Tools for Maximum Likelihood Methods Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.
57 Bayesian Inference deBInfer Bayesian Inference for Differential Equations A Bayesian framework for parameter inference in differential equations. This approach offers a rigorous methodology for parameter inference as well as modeling the link between unobservable model states and parameters, and observable quantities. Provides templates for the DE model, the observation model and data likelihood, and the model parameters and their prior distributions. A Markov chain Monte Carlo (MCMC) procedure processes these inputs to estimate the posterior distributions of the parameters and any derived quantities, including the model trajectories. Further functionality is provided to facilitate MCMC diagnostics and the visualisation of the posterior distributions of model parameters and trajectories.
58 Bayesian Inference dlm Bayesian and Likelihood Analysis of Dynamic Linear Models Provides routines for Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models.
59 Bayesian Inference DPpackage (core) Bayesian Nonparametric Modeling in R Functions to perform inference via simulation from the posterior distributions for Bayesian nonparametric and semiparametric models. Although the name of the package was motivated by the Dirichlet Process prior, the package considers and will consider other priors on functional spaces. So far, DPpackage includes models considering Dirichlet Processes, Dependent Dirichlet Processes, Dependent Poisson- Dirichlet Processes, Hierarchical Dirichlet Processes, Polya Trees, Linear Dependent Tailfree Processes, Mixtures of Triangular distributions, Random Bernstein polynomials priors and Dependent Bernstein Polynomials. The package also includes models considering Penalized B-Splines. Includes semiparametric models for marginal and conditional density estimation, ROC curve analysis, interval censored data, binary regression models, generalized linear mixed models, IRT type models, and generalized additive models. Also contains functions to compute Pseudo-Bayes factors for model comparison, and to elicitate the precision parameter of the Dirichlet Process. To maximize computational efficiency, the actual sampling for each model is done in compiled FORTRAN. The functions return objects which can be subsequently analyzed with functions provided in the ‘coda’ package.
60 Bayesian Inference EbayesThresh Empirical Bayes Thresholding and Related Methods Empirical Bayes thresholding using the methods developed by I. M. Johnstone and B. W. Silverman. The basic problem is to estimate a mean vector given a vector of observations of the mean vector plus white noise, taking advantage of possible sparsity in the mean vector. Within a Bayesian formulation, the elements of the mean vector are modelled as having, independently, a distribution that is a mixture of an atom of probability at zero and a suitable heavy-tailed distribution. The mixing parameter can be estimated by a marginal maximum likelihood approach. This leads to an adaptive thresholding approach on the original data. Extensions of the basic method, in particular to wavelet thresholding, are also implemented within the package.
61 Bayesian Inference ebdbNet Empirical Bayes Estimation of Dynamic Bayesian Networks Infer the adjacency matrix of a network from time course data using an empirical Bayes estimation procedure based on Dynamic Bayesian Networks.
62 Bayesian Inference eco Ecological Inference in 2x2 Tables Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the Expectation-Maximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individual-level data can be directly incorporated into the estimation whenever such data are available. Along with in-sample and out-of-sample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.
63 Bayesian Inference eigenmodel Semiparametric Factor and Regression Models for Symmetric Relational Data Estimation of the parameters in a model for symmetric relational data (e.g., the above-diagonal part of a square matrix), using a model-based eigenvalue decomposition and regression. Missing data is accommodated, and a posterior mean for missing data is calculated under the assumption that the data are missing at random. The marginal distribution of the relational data can be arbitrary, and is fit with an ordered probit specification. See Hoff (2007) <arXiv:0711.1146> for details on the model.
64 Bayesian Inference ensembleBMA Probabilistic Forecasting using Ensembles and Bayesian Model Averaging Bayesian Model Averaging to create probabilistic forecasts from ensemble forecasts and weather observations.
65 Bayesian Inference evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
66 Bayesian Inference exactLoglinTest Monte Carlo Exact Tests for Log-linear models Monte Carlo and MCMC goodness of fit tests for log-linear models
67 Bayesian Inference factorQR Bayesian quantile regression factor models Package to fit Bayesian quantile regression models that assume a factor structure for at least part of the design matrix.
68 Bayesian Inference FME A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steady-state solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.
69 Bayesian Inference geoR Analysis of Geostatistical Data Geostatistical analysis including traditional, likelihood-based and Bayesian methods.
70 Bayesian Inference geoRglm A Package for Generalised Linear Spatial Models Functions for inference in generalised linear spatial models. The posterior and predictive inference is based on Markov chain Monte Carlo methods. Package geoRglm is an extension to the package geoR, which must be installed first.
71 Bayesian Inference ggmcmc Tools for Analyzing MCMC Simulations from Bayesian Inference Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables.
72 Bayesian Inference gRain Graphical Independence Networks Probability propagation in graphical independence networks, also known as Bayesian networks or probabilistic expert systems.
73 Bayesian Inference growcurves Bayesian Semi and Nonparametric Growth Curve Models that Additionally Include Multiple Membership Random Effects Employs a non-parametric formulation for by-subject random effect parameters to borrow strength over a constrained number of repeated measurement waves in a fashion that permits multiple effects per subject. One class of models employs a Dirichlet process (DP) prior for the subject random effects and includes an additional set of random effects that utilize a different grouping factor and are mapped back to clients through a multiple membership weight matrix; e.g. treatment(s) exposure or dosage. A second class of models employs a dependent DP (DDP) prior for the subject random effects that directly incorporates the multiple membership pattern.
74 Bayesian Inference hbsae Hierarchical Bayesian Small Area Estimation Functions to compute small area estimates based on a basic area or unit-level model. The model is fit using restricted maximum likelihood, or in a hierarchical Bayesian way. In the latter case numerical integration is used to average over the posterior density for the between-area variance. The output includes the model fit, small area estimates and corresponding MSEs, as well as some model selection measures. Additional functions provide means to compute aggregate estimates and MSEs, to minimally adjust the small area estimates to benchmarks at a higher aggregation level, and to graphically compare different sets of small area estimates.
75 Bayesian Inference HI Simulation from distributions supported by nested hyperplanes Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.
76 Bayesian Inference Hmisc Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
77 Bayesian Inference iterLap Approximate Probability Densities by Iterated Laplace Approximations The iterLap (iterated Laplace approximation) algorithm approximates a general (possibly non-normalized) probability density on R^p, by repeated Laplace approximations to the difference between current approximation and true density (on log scale). The final approximation is a mixture of multivariate normal distributions and might be used for example as a proposal distribution for importance sampling (eg in Bayesian applications). The algorithm can be seen as a computational generalization of the Laplace approximation suitable for skew or multimodal densities.
78 Bayesian Inference LaplacesDemon Complete Environment for Bayesian Inference Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.
79 Bayesian Inference LearnBayes Functions for Learning Bayesian Inference A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.
80 Bayesian Inference lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
81 Bayesian Inference lmm Linear Mixed Models It implements Expectation/Conditional Maximization Either (ECME) and rapidly converging algorithms as well as Bayesian inference for linear mixed models, which is described in Schafer, J.L. (1998) “Some improved procedures for linear mixed models”. Dept. of Statistics, The Pennsylvania State University.
82 Bayesian Inference MasterBayes ML and MCMC Methods for Pedigree Reconstruction and Analysis The primary aim of MasterBayes is to use MCMC techniques to integrate over uncertainty in pedigree configurations estimated from molecular markers and phenotypic data. Emphasis is put on the marginal distribution of parameters that relate the phenotypic data to the pedigree. All simulation is done in compiled C++ for efficiency.
83 Bayesian Inference matchingMarkets Analysis of Stable Matchings Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
84 Bayesian Inference mcmc (core) Markov Chain Monte Carlo Simulates continuous distributions of random vectors using Markov chain Monte Carlo (MCMC). Users specify the distribution by an R function that evaluates the log unnormalized density. Algorithms are random walk Metropolis algorithm (function metrop), simulated tempering (function temper), and morphometric random walk Metropolis (Johnson and Geyer, 2012, <doi:10.1214/12-AOS1048>, function morph.metrop), which achieves geometric ergodicity by change of variable.
85 Bayesian Inference MCMCglmm MCMC Generalised Linear Mixed Models MCMC Generalised Linear Mixed Models.
86 Bayesian Inference MCMCpack (core) Markov Chain Monte Carlo (MCMC) Package Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
87 Bayesian Inference MCMCvis Tools to Visualize, Manipulate, and Summarize MCMC Output Performs key functions for MCMC analysis using minimal code - visualizes, manipulates, and summarizes MCMC output. Functions support simple and straightforward subsetting of model parameters within the calls, and produce presentable and ‘publication-ready’ output. MCMC output may be derived from Bayesian model output fit with JAGS, Stan, or other MCMC samplers.
88 Bayesian Inference mgcv Mixed GAM Computation Vehicle with Automatic Smoothness Estimation Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.
89 Bayesian Inference mlogitBMA Bayesian Model Averaging for Multinomial Logit Models Provides a modified function bic.glm of the BMA package that can be applied to multinomial logit (MNL) data. The data is converted to binary logit using the Begg & Gray approximation. The package also contains functions for maximum likelihood estimation of MNL.
90 Bayesian Inference MNP R Package for Fitting the Multinomial Probit Model Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. <doi:10.18637/jss.v014.i03>.
91 Bayesian Inference mombf Bayesian Model Selection and Averaging for Non-Local and Local Priors Bayesian model selection and averaging for regression and mixtures for non-local and selected local priors.
92 Bayesian Inference monomvn Estimation for Multivariate Normal and Student-t Data with Monotone Missingness Estimation of multivariate normal and student-t data of arbitrary dimension where the pattern of missing data is monotone. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided.
93 Bayesian Inference NetworkChange Bayesian Package for Network Changepoint Analysis Network changepoint analysis for undirected network data. The package implements a hidden Markov multilinear tensor regression model (Park and Sohn, 2017, <http://jhp.snu.ac.kr/NetworkChange.pdf>). Functions for break number detection using the approximate marginal likelihood and WAIC are also provided.
94 Bayesian Inference nimble (core) MCMC, Particle Filtering, and Programmable Hierarchical Modeling A system for writing hierarchical statistical models largely compatible with ‘BUGS’ and ‘JAGS’, writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. ‘NIMBLE’ includes default methods for MCMC, particle filtering, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers ‘NIMBLE’ provides. ‘NIMBLE’ extends the ‘BUGS’/‘JAGS’ language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the ‘BUGS’/‘JAGS’ language for writing models, one can use ‘NIMBLE’ for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.
95 Bayesian Inference openEBGM EBGM Disproportionality Scores for Adverse Event Data Mining An implementation of DuMouchel’s (1999) <doi:10.1080/00031305.1999.10474456> Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and quantile scores from the posterior distribution using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the ‘PhViD’ package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Now includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the ‘mederrRank’ package.
96 Bayesian Inference pacbpred PAC-Bayesian Estimation and Prediction in Sparse Additive Models This package is intended to perform estimation and prediction in high-dimensional additive models, using a sparse PAC-Bayesian point of view and a MCMC algorithm. The method is fully described in Guedj and Alquier (2013), ‘PAC-Bayesian Estimation and Prediction in Sparse Additive Models’, Electronic Journal of Statistics, 7, 264291.
97 Bayesian Inference PAWL Implementation of the PAWL algorithm Implementation of the Parallel Adaptive Wang-Landau algorithm. Also implemented for comparison: parallel adaptive Metropolis-Hastings,SMC sampler.
98 Bayesian Inference predmixcor Classification rule based on Bayesian mixture models with feature selection bias corrected “train_predict_mix” predicts the binary response with binary features
99 Bayesian Inference PReMiuM Dirichlet Process Bayesian Clustering, Profile Regression Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.
100 Bayesian Inference prevalence Tools for Prevalence Assessment Studies The prevalence package provides Frequentist and Bayesian methods for prevalence assessment studies. IMPORTANT: the truePrev functions in the prevalence package call on JAGS (Just Another Gibbs Sampler), which therefore has to be available on the user’s system. JAGS can be downloaded from http://mcmc-jags.sourceforge.net/.
101 Bayesian Inference profdpm Profile Dirichlet Process Mixtures This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.
102 Bayesian Inference pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
103 Bayesian Inference R2BayesX Estimate Structured Additive Regression Models with ‘BayesX’ An R interface to estimate structured additive regression (STAR) models with ‘BayesX’.
104 Bayesian Inference R2jags Using R to Run ‘JAGS’ Providing wrapper functions to implement Bayesian analysis in JAGS. Some major features include monitoring convergence of a MCMC model using Rubin and Gelman Rhat statistics, automatically running a MCMC model till it converges, and implementing parallel processing of a MCMC model for multiple chains.
105 Bayesian Inference R2WinBUGS Running ‘WinBUGS’ and ‘OpenBUGS’ from ‘R’ / ‘S-PLUS’ Invoke a ‘BUGS’ model in ‘OpenBUGS’ or ‘WinBUGS’, a class “bugs” for ‘BUGS’ results and functions to work with that class. Function write.model() allows a ‘BUGS’ model file to be written. The class and auxiliary functions could be used with other MCMC programs, including ‘JAGS’.
106 Bayesian Inference ramps Bayesian Geostatistical Modeling with RAMPS Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm designed to lower autocorrelation in MCMC samples. Package performance is tuned for large spatial datasets.
107 Bayesian Inference revdbayes Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.r-project.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the K-gaps model of Suveges and Davison (2010) <doi:10.1214/09-AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.
108 Bayesian Inference RJaCGH Reversible Jump MCMC for the Analysis of CGH Arrays Bayesian analysis of CGH microarrays fitting Hidden Markov Chain models. The selection of the number of states is made via their posterior probability computed by Reversible Jump Markov Chain Monte Carlo Methods. Also returns probabilistic common regions for gains/losses.
109 Bayesian Inference rjags Bayesian Graphical Models using MCMC Interface to the JAGS MCMC library.
110 Bayesian Inference RSGHB Functions for Hierarchical Bayesian Estimation: A Flexible Approach Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
111 Bayesian Inference RSGHB Functions for Hierarchical Bayesian Estimation: A Flexible Approach Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
112 Bayesian Inference rstan R Interface to Stan User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the ‘StanHeaders’ package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via ‘variational’ approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.
113 Bayesian Inference rstiefel Random Orthonormal Matrix Generation and Optimization on the Stiefel Manifold Simulation of random orthonormal matrices from linear and quadratic exponential family distributions on the Stiefel manifold. The most general type of distribution covered is the matrix-variate Bingham-von Mises-Fisher distribution. Most of the simulation methods are presented in Hoff(2009) “Simulation of the Matrix Bingham-von Mises-Fisher Distribution, With Applications to Multivariate and Relational Data” <doi:10.1198/jcgs.2009.07177>. The package also includes functions for optimization on the Stiefel manifold based on algorithms described in Wen and Yin (2013) “A feasible method for optimization with orthogonality constraints” <doi:10.1007/s10107-012-0584-1>.
114 Bayesian Inference runjags Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS User-friendly interface utilities for MCMC models via Just Another Gibbs Sampler (JAGS), facilitating the use of parallel (or distributed) processors for multiple chains, automated control of convergence and sample length diagnostics, and evaluation of the performance of a model using drop-k validation or against simulated data. Template model specifications can be generated using a standard lme4-style formula interface to assist users less familiar with the BUGS syntax. A JAGS extension module provides additional distributions including the Pareto family of distributions, the DuMouchel prior and the half-Cauchy prior.
115 Bayesian Inference Runuran R Interface to the ‘UNU.RAN’ Random Variate Generators Interface to the ‘UNU.RAN’ library for Universal Non-Uniform RANdom variate generators. Thus it allows to build non-uniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.
116 Bayesian Inference RxCEcolInf ‘R x C Ecological Inference With Optional Incorporation of Survey Information’ Fits the R x C inference model described in Greiner and Quinn (2009). Allows incorporation of survey results.
117 Bayesian Inference SamplerCompare A Framework for Comparing the Performance of MCMC Samplers A framework for running sets of MCMC samplers on sets of distributions with a variety of tuning parameters, along with plotting functions to visualize the results of those simulations.
118 Bayesian Inference SampleSizeMeans Sample size calculations for normal means A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate a normal mean or the difference between two normal means. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of normal means are provided. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.
119 Bayesian Inference SampleSizeProportions Calculating sample size requirements when estimating the difference between two binomial proportions A set of R functions for calculating sample size requirements using three different Bayesian criteria in the context of designing an experiment to estimate the difference between two binomial proportions. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of binomial observations are provided. In all cases, estimation of the difference between two binomial proportions is considered. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided.
120 Bayesian Inference sbgcop Semiparametric Bayesian Gaussian Copula Estimation and Imputation Estimation and inference for parameters in a Gaussian copula model, treating the univariate marginal distributions as nuisance parameters as described in Hoff (2007) <doi:10.1214/07-AOAS107>. This package also provides a semiparametric imputation procedure for missing multivariate data.
121 Bayesian Inference SimpleTable Bayesian Inference and Sensitivity Analysis for Causal Effects from 2 x 2 and 2 x 2 x K Tables in the Presence of Unmeasured Confounding SimpleTable provides a series of methods to conduct Bayesian inference and sensitivity analysis for causal effects from 2 x 2 and 2 x 2 x K tables when unmeasured confounding is present or suspected.
122 Bayesian Inference sna Tools for Social Network Analysis A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.
123 Bayesian Inference spBayes Univariate and Multivariate Spatial-Temporal Modeling Fits univariate and multivariate spatio-temporal random effects models for point-referenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) <doi:10.18637/jss.v063.i13> and Finley, Banerjee, and Cook (2014) <doi:10.1111/2041-210X.12189>.
124 Bayesian Inference spikeslab Prediction and variable selection using spike and slab regression Spike and slab for prediction and variable selection in linear regression models. Uses a generalized elastic net for variable selection.
125 Bayesian Inference spikeSlabGAM Bayesian Variable Selection and Model Choice for Generalized Additive Mixed Models Bayesian variable selection, model choice, and regularized estimation for (spatial) generalized additive mixed regression models via stochastic search variable selection with spike-and-slab priors.
126 Bayesian Inference spTimer Spatio-Temporal Bayesian Modelling Fits, spatially predicts and temporally forecasts large amounts of space-time data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian Auto-Regressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatio-temporal big-n problems. Bakar and Sahu (2015) <doi:10.18637/jss.v063.i15>.
127 Bayesian Inference ssgraph Bayesian Graphical Estimation using Spike-and-Slab Priors Bayesian estimation for undirected graphical models using spike-and-slab priors. The package handles continuous, discrete, and mixed data. To speed up the computations, the computationally intensive tasks of the package are implemented in C++ in parallel using OpenMP.
128 Bayesian Inference stochvol Efficient Bayesian Inference for Stochastic Volatility (SV) Models Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods. Methodological details are given in Kastner and Fruhwirth-Schnatter (2014) <doi:10.1016/j.csda.2013.01.002>; the most common use cases are described in Kastner (2016) <doi:10.18637/jss.v069.i05>. Also incorporates SV with leverage.
129 Bayesian Inference tgp Bayesian Treed Gaussian Process Models Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions.
130 Bayesian Inference zic Bayesian Inference for Zero-Inflated Count Models Provides MCMC algorithms for the analysis of zero-inflated count models. The case of stochastic search variable selection (SVS) is also considered. All MCMC samplers are coded in C++ for improved efficiency. A data set considering the demand for health care is provided.
131 Chemometrics and Computational Physics ALS (core) Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) Alternating least squares is often used to resolve components contributing to data with a bilinear structure; the basic technique may be extended to alternating constrained least squares. Commonly applied constraints include unimodality, non-negativity, and normalization of components. Several data matrices may be decomposed simultaneously by assuming that one of the two matrices in the bilinear decomposition is shared between datasets.
132 Chemometrics and Computational Physics AnalyzeFMRI Functions for Analysis of fMRI Datasets Stored in the ANALYZE or NIFTI Format Functions for I/O, visualisation and analysis of functional Magnetic Resonance Imaging (fMRI) datasets stored in the ANALYZE or NIFTI format. Note that the latest version of XQuartz seems to be necessary under MacOS.
133 Chemometrics and Computational Physics AquaEnv Integrated Development Toolbox for Aquatic Chemical Model Generation Toolbox for the experimental aquatic chemist, focused on acidification and CO2 air-water exchange. It contains all elements to model the pH, the related CO2 air-water exchange, and aquatic acid-base chemistry for an arbitrary marine, estuarine or freshwater system. It contains a suite of tools for sensitivity analysis, visualisation, modelling of chemical batches, and can be used to build dynamic models of aquatic systems. As from version 1.0-4, it also contains functions to calculate the buffer factors.
134 Chemometrics and Computational Physics astro Astronomy Functions, Tools and Routines The astro package provides a series of functions, tools and routines in everyday use within astronomy. Broadly speaking, one may group these functions into 7 main areas, namely: cosmology, FITS file manipulation, the Sersic function, plotting, data manipulation, statistics and general convenience functions and scripting tools.
135 Chemometrics and Computational Physics astrochron A Computational Tool for Astrochronology Routines for astrochronologic testing, astronomical time scale construction, and time series analysis. Also included are a range of statistical analysis and modeling routines that are relevant to time scale development and paleoclimate analysis.
136 Chemometrics and Computational Physics astrodatR Astronomical Data A collection of 19 datasets from contemporary astronomical research. They are described the textbook ‘Modern Statistical Methods for Astronomy with R Applications’ by Eric D. Feigelson and G. Jogesh Babu (Cambridge University Press, 2012, Appendix C) or on the website of Penn State’s Center for Astrostatistics (http://astrostatistics.psu.edu/datasets). These datasets can be used to exercise methodology involving: density estimation; heteroscedastic measurement errors; contingency tables; two-sample hypothesis tests; spatial point processes; nonlinear regression; mixture models; censoring and truncation; multivariate analysis; classification and clustering; inhomogeneous Poisson processes; periodic and stochastic time series analysis.
137 Chemometrics and Computational Physics astroFns Astronomy: time and position functions, misc. utilities Miscellaneous astronomy functions, utilities, and data.
138 Chemometrics and Computational Physics astrolibR Astronomy Users Library Several dozen low-level utilities and codes from the Interactive Data Language (IDL) Astronomy Users Library (http://idlastro.gsfc.nasa.gov) are implemented in R. They treat: time, coordinate and proper motion transformations; terrestrial precession and nutation, atmospheric refraction and aberration, barycentric corrections, and related effects; utilities for astrometry, photometry, and spectroscopy; and utilities for planetary, stellar, Galactic, and extragalactic science.
139 Chemometrics and Computational Physics ATmet Advanced Tools for Metrology This package provides functions for smart sampling and sensitivity analysis for metrology applications, including computationally expensive problems.
140 Chemometrics and Computational Physics Bchron Radiocarbon Dating, Age-Depth Modelling, Relative Sea Level Rate Estimation, and Non-Parametric Phase Modelling Enables quick calibration of radiocarbon dates under various calibration curves (including user generated ones); age-depth modelling as per the algorithm of Haslett and Parnell (2008) <doi:10.1111/j.1467-9876.2008.00623.x>; Relative sea level rate estimation incorporating time uncertainty in polynomial regression models (Parnell and Gehrels 2015) <doi:10.1002/9781118452547.ch32>; non-parametric phase modelling via Gaussian mixtures as a means to determine the activity of a site (and as an alternative to the Oxcal function SUM; currently unpublished), and reverse calibration of dates from calibrated into un-calibrated years (also unpublished).
141 Chemometrics and Computational Physics BioMark Find Biomarkers in Two-Class Discrimination Problems Variable selection methods are provided for several classification methods: the lasso/elastic net, PCLDA, PLSDA, and several t-tests. Two approaches for selecting cutoffs can be used, one based on the stability of model coefficients under perturbation, and the other on higher criticism.
142 Chemometrics and Computational Physics bvls The Stark-Parker algorithm for bounded-variable least squares An R interface to the Stark-Parker implementation of an algorithm for bounded-variable least squares
143 Chemometrics and Computational Physics celestial Collection of Common Astronomical Conversion Routines and Functions Contains a number of common astronomy conversion routines, particularly the HMS and degrees schemes, which can be fiddly to convert between on mass due to the textural nature of the former. It allows users to coordinate match datasets quickly. It also contains functions for various cosmological calculations.
144 Chemometrics and Computational Physics chemCal (core) Calibration Functions for Analytical Chemistry Simple functions for plotting linear calibration functions and estimating standard errors for measurements according to the Handbook of Chemometrics and Qualimetrics: Part A by Massart et al. There are also functions estimating the limit of detection (LOD) and limit of quantification (LOQ). The functions work on model objects from - optionally weighted - linear regression (lm) or robust linear regression (‘rlm’ from the ‘MASS’ package).
145 Chemometrics and Computational Physics chemometrics Multivariate Statistical Analysis in Chemometrics R companion to the book “Introduction to Multivariate Statistical Analysis in Chemometrics” written by K. Varmuza and P. Filzmoser (2009).
146 Chemometrics and Computational Physics ChemometricsWithR Chemometrics with R - Multivariate Data Analysis in the Natural Sciences and Life Sciences Functions and scripts used in the book “Chemometrics with R - Multivariate Data Analysis in the Natural Sciences and Life Sciences” by Ron Wehrens, Springer (2011). Data used in the package are available from github.
147 Chemometrics and Computational Physics ChemoSpec Exploratory Chemometrics for Spectroscopy A collection of functions for top-down exploratory data analysis of spectral data including nuclear magnetic resonance (NMR), infrared (IR), Raman, X-ray fluorescence (XRF) and other similar types of spectroscopy. Includes functions for plotting and inspecting spectra, peak alignment, hierarchical cluster analysis (HCA), principal components analysis (PCA) and model-based clustering. Robust methods appropriate for this type of high-dimensional data are available. ChemoSpec is designed for structured experiments, such as metabolomics investigations, where the samples fall into treatment and control groups. Graphical output is formatted consistently for publication quality plots. ChemoSpec is intended to be very user friendly and to help you get usable results quickly. A vignette covering typical operations is available.
148 Chemometrics and Computational Physics ChemoSpec2D Exploratory Chemometrics for 2D Spectroscopy A collection of functions for exploratory chemometrics of 2D spectroscopic data sets such as COSY (correlated spectroscopy) and HSQC (heteronuclear single quantum coherence) 2D NMR (nuclear magnetic resonance) spectra. ‘ChemoSpec2D’ deploys methods aimed primarily at classification of samples and the identification of spectral features which are important in distinguishing samples from each other. Each 2D spectrum (a matrix) is treated as the unit of observation, and thus the physical sample in the spectrometer corresponds to the sample from a statistical perspective. In addition to chemometric tools, a few tools are provided for plotting 2D spectra, but these are not intended to replace the functionality typically available on the spectrometer. ‘ChemoSpec2D’ takes many of its cues from ‘ChemoSpec’ and tries to create consistent graphical output and to be very user friendly.
149 Chemometrics and Computational Physics CHNOSZ Thermodynamic Calculations and Diagrams for Geochemistry An integrated set of tools for thermodynamic calculations in aqueous geochemistry and geobiochemistry. Functions are provided for writing balanced reactions to form species from user-selected basis species and for calculating the standard molal properties of species and reactions, including the standard Gibbs energy and equilibrium constant. Calculations of the non-equilibrium chemical affinity and equilibrium chemical activity of species can be portrayed on diagrams as a function of temperature, pressure, or activity of basis species; in two dimensions, this gives a maximum affinity or predominance diagram. The diagrams have formatted chemical formulas and axis labels, and water stability limits can be added to Eh-pH, oxygen fugacity- temperature, and other diagrams with a redox variable. The package has been developed to handle common calculations in aqueous geochemistry, such as solubility due to complexation of metal ions, mineral buffers of redox or pH, and changing the basis species across a diagram (“mosaic diagrams”). CHNOSZ also has unique capabilities for comparing the compositional and thermodynamic properties of different proteins.
150 Chemometrics and Computational Physics clustvarsel Variable Selection for Gaussian Model-Based Clustering Variable selection for Gaussian model-based clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forward-backward or backward-forward direction, with or without sub-sampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.
151 Chemometrics and Computational Physics compositions Compositional Data Analysis Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.
152 Chemometrics and Computational Physics constants Reference on Constants, Units and Uncertainty CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with errors and/or the values with units are also provided if the ‘errors’ and/or the ‘units’ packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This package contains the “2014 CODATA” version, published on 25 June 2015: Mohr, P. J., Newell, D. B. and Taylor, B. N. (2016) <doi:10.1103/RevModPhys.88.035009>, <doi:10.1063/1.4954402>.
153 Chemometrics and Computational Physics cosmoFns Functions for cosmological distances, times, luminosities, etc Package encapsulates standard expressions for distances, times, luminosities, and other quantities useful in observational cosmology, including molecular line observations. Currently coded for a flat universe only.
154 Chemometrics and Computational Physics CRAC Cosmology R Analysis Code R functions for cosmological research. The main functions are similar to the python library, cosmolopy.
155 Chemometrics and Computational Physics dielectric Defines some physical constants and dielectric functions commonly used in optics, plasmonics Physical constants. Gold, silver and glass permittivities, together with spline interpolation functions.
156 Chemometrics and Computational Physics drc Analysis of Dose-Response Curves Analysis of dose-response data is made available through a suite of flexible and versatile model fitting and after-fitting functions.
157 Chemometrics and Computational Physics eChem Simulations for Electrochemistry Experiments Simulates cyclic voltammetry, linear-sweep voltammetry (both with and without stirring of the solution), and single-pulse and double-pulse chronoamperometry and chronocoulometry experiments using the implicit finite difference method outlined in Gosser (1993, ISBN: 9781560810261) and in Brown (2015) <doi:10.1021/acs.jchemed.5b00225>. Additional functions provide ways to display and to examine the results of these simulations. The primary purpose of this package is to provide tools for use in courses in analytical chemistry.
158 Chemometrics and Computational Physics EEM Read and Preprocess Fluorescence Excitation-Emission Matrix (EEM) Data Read raw EEM data and prepares them for further analysis.
159 Chemometrics and Computational Physics elasticnet Elastic-Net for Sparse Estimation and Sparse PCA Provides functions for fitting the entire solution path of the Elastic-Net and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 2005-10.
160 Chemometrics and Computational Physics enpls Ensemble Partial Least Squares Regression An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.
161 Chemometrics and Computational Physics errors Uncertainty Propagation for R Vectors Support for measurement errors in R vectors, matrices and arrays: automatic uncertainty propagation and reporting.
162 Chemometrics and Computational Physics fastICA FastICA Algorithms to Perform ICA and Projection Pursuit Implementation of FastICA algorithm to perform Independent Component Analysis (ICA) and Projection Pursuit.
163 Chemometrics and Computational Physics fingerprint Functions to Operate on Binary Fingerprint Data Functions to manipulate binary fingerprints of arbitrary length. A fingerprint is represented by an object of S4 class ‘fingerprint’ which is internally represented a vector of integers, such that each element represents the position in the fingerprint that is set to 1. The bitwise logical functions in R are overridden so that they can be used directly with ‘fingerprint’ objects. A number of distance metrics are also available (many contributed by Michael Fadock). Fingerprints can be converted to Euclidean vectors (i.e., points on the unit hypersphere) and can also be folded using OR. Arbitrary fingerprint formats can be handled via line handlers. Currently handlers are provided for CDK, MOE and BCI fingerprint data.
164 Chemometrics and Computational Physics FITSio FITS (Flexible Image Transport System) Utilities Utilities to read and write files in the FITS (Flexible Image Transport System) format, a standard format in astronomy (see e.g. <https://en.wikipedia.org/wiki/FITS> for more information). Present low-level routines allow: reading, parsing, and modifying FITS headers; reading FITS images (multi-dimensional arrays); reading FITS binary and ASCII tables; and writing FITS images (multi-dimensional arrays). Higher-level functions allow: reading files composed of one or more headers and a single (perhaps multidimensional) image or single table; reading tables into data frames; generating vectors for image array axes; scaling and writing images as 16-bit integers. Known incompletenesses are reading random group extensions, as well as bit, complex, and array descriptor data types in binary tables.
165 Chemometrics and Computational Physics fmri Analysis of fMRI Experiments Contains R-functions to perform an fMRI analysis as described in Tabelow et al. (2006) <doi:10.1016/j.neuroimage.2006.06.029>, Polzehl et al. (2010) <doi:10.1016/j.neuroimage.2010.04.241>, Tabelow and Polzehl (2011) <doi:10.18637/jss.v044.i11>.
166 Chemometrics and Computational Physics fpca Restricted MLE for Functional Principal Components Analysis A geometric approach to MLE for functional principal components
167 Chemometrics and Computational Physics FTICRMS Programs for Analyzing Fourier Transform-Ion Cyclotron Resonance Mass Spectrometry Data This package was developed partially with funding from the NIH Training Program in Biomolecular Technology (2-T32-GM08799).
168 Chemometrics and Computational Physics homals Gifi Methods for Optimal Scaling Performs a homogeneity analysis (multiple correspondence analysis) and various extensions. Rank restrictions on the category quantifications can be imposed (nonlinear PCA). The categories are transformed by means of optimal scaling with options for nominal, ordinal, and numerical scale levels (for rank-1 restrictions). Variables can be grouped into sets, in order to emulate regression analysis and canonical correlation analysis.
169 Chemometrics and Computational Physics hyperSpec Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, …) Comfortable ways to work with hyperspectral data sets. I.e. spatially or time-resolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f (wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.
170 Chemometrics and Computational Physics investr Inverse Estimation/Calibration Functions Functions to facilitate inverse estimation (e.g., calibration) in linear, generalized linear, nonlinear, and (linear) mixed-effects models. A generic function is also provided for plotting fitted regression models with or without confidence/prediction bands that may be of use to the general user.
171 Chemometrics and Computational Physics Iso (core) Functions to Perform Isotonic Regression Linear order and unimodal order (univariate) isotonic regression; bivariate isotonic regression with linear order on both variables.
172 Chemometrics and Computational Physics kohonen (core) Supervised and Unsupervised Self-Organising Maps Functions to train self-organising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.
173 Chemometrics and Computational Physics leaps Regression Subset Selection Regression subset selection, including exhaustive search.
174 Chemometrics and Computational Physics lira LInear Regression in Astronomy Performs Bayesian linear regression and forecasting in astronomy. The method accounts for heteroscedastic errors in both the independent and the dependent variables, intrinsic scatters (in both variables) and scatter correlation, time evolution of slopes, normalization, scatters, Malmquist and Eddington bias, upper limits and break of linearity. The posterior distribution of the regression parameters is sampled with a Gibbs method exploiting the JAGS library.
175 Chemometrics and Computational Physics lspls LS-PLS Models Implements the LS-PLS (least squares - partial least squares) method described in for instance Jorgensen, K., Segtnan, V. H., Thyholt, K., Nas, T. (2004) “A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables” Journal of Chemometrics, 18(10), 451464, <doi:10.1002/cem.890>.
176 Chemometrics and Computational Physics MALDIquant Quantitative Analysis of Mass Spectrometry Data A complete analysis pipeline for matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) and other two-dimensional mass spectrometry data. In addition to commonly used plotting and processing methods it includes distinctive features, namely baseline subtraction methods such as morphological filters (TopHat) or the statistics-sensitive non-linear iterative peak-clipping algorithm (SNIP), peak alignment using warping functions, handling of replicated measurements as well as allowing spectra with different resolutions.
177 Chemometrics and Computational Physics MALDIrppa MALDI Mass Spectrometry Data Robust Pre-Processing and Analysis Provides methods for quality control and robust pre-processing and analysis of MALDI mass spectrometry data.
178 Chemometrics and Computational Physics measurements Tools for Units of Measurement Collection of tools to make working with physical measurements easier. Convert between metric and imperial units, or calculate a dimension’s unknown value from other dimensions’ measurements.
179 Chemometrics and Computational Physics metRology Support for Metrological Applications Provides classes and calculation and plotting functions for metrology applications, including measurement uncertainty estimation and inter-laboratory metrology comparison studies.
180 Chemometrics and Computational Physics minpack.lm R Interface to the Levenberg-Marquardt Nonlinear Least-Squares Algorithm Found in MINPACK, Plus Support for Bounds The nls.lm function provides an R interface to lmder and lmdif from the MINPACK library, for solving nonlinear least-squares problems by a modification of the Levenberg-Marquardt algorithm, with support for lower and upper parameter bounds. The implementation can be used via nls-like calls using the nlsLM function.
181 Chemometrics and Computational Physics NISTunits Fundamental Physical Constants and Unit Conversions from NIST Fundamental physical constants (Quantity, Value, Uncertainty, Unit) for SI (International System of Units) and non-SI units, plus unit conversions Based on the data from NIST (National Institute of Standards and Technology, USA)
182 Chemometrics and Computational Physics nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
183 Chemometrics and Computational Physics nlreg Higher Order Inference for Nonlinear Heteroscedastic Models Likelihood inference based on higher order approximations for nonlinear models with possibly non constant variance.
184 Chemometrics and Computational Physics nnls (core) The Lawson-Hanson algorithm for non-negative least squares (NNLS) An R interface to the Lawson-Hanson implementation of an algorithm for non-negative least squares (NNLS). Also allows the combination of non-negative and non-positive constraints.
185 Chemometrics and Computational Physics OrgMassSpecR Organic Mass Spectrometry Organic/biological mass spectrometry data analysis.
186 Chemometrics and Computational Physics pcaPP Robust PCA by Projection Pursuit Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) <doi:10.2139/ssrn.968376>, Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/978-3-642-33042-1_31>.
187 Chemometrics and Computational Physics PET Simulation and Reconstruction of PET Images Implementation of different analytic/direct and iterative reconstruction methods of radon transformed data such as PET data. It also offer the possibility to simulate PET data.
188 Chemometrics and Computational Physics planar Multilayer Optics Solves the electromagnetic problem of reflection and transmission at a planar multilayer interface. Also computed are the decay rates and emission profile for a dipolar emitter.
189 Chemometrics and Computational Physics pls (core) Partial Least Squares and Principal Component Regression Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).
190 Chemometrics and Computational Physics plspm Tools for Partial Least Squares Path Modeling (PLS-PM) Partial Least Squares Path Modeling (PLS-PM) analysis for both metric and non-metric data, as well as REBUS analysis.
191 Chemometrics and Computational Physics ppls Penalized Partial Least Squares Contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via cross-validation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.
192 Chemometrics and Computational Physics prospectr Miscellaneous functions for processing and sample selection of vis-NIR diffuse reflectance data The package provides functions for pretreatment and sample selection of visible and near infrared diffuse reflectance spectra
193 Chemometrics and Computational Physics psy Various procedures used in psychometry Kappa, ICC, Cronbach alpha, screeplot, mtmm
194 Chemometrics and Computational Physics PTAk (core) Principal Tensor Analysis on k Modes A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting non-identity metrics and penalisations. 2-way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tucker-n) and PARAFAC/CANDECOMP with these extensions.
195 Chemometrics and Computational Physics rcdk Interface to the ‘CDK’ Libraries Allows the user to access functionality in the ‘CDK’, a Java framework for chemoinformatics. This allows the user to load molecules, evaluate fingerprints, calculate molecular descriptors and so on. In addition, the ‘CDK’ API allows the user to view structures in 2D.
196 Chemometrics and Computational Physics rcdklibs The CDK Libraries Packaged for R An R interface to the Chemistry Development Kit, a Java library for chemoinformatics. Given the size of the library itself, this package is not expected to change very frequently. To make use of the CDK within R, it is suggested that you use the ‘rcdk’ package. Note that it is possible to directly interact with the CDK using ‘rJava’. However ‘rcdk’ exposes functionality in a more idiomatic way. The CDK library itself is released as LGPL and the sources can be obtained from <https://github.com/cdk/cdk>.
197 Chemometrics and Computational Physics represent Determine the representativity of two multidimensional data sets Contains workhorse function jrparams(), as well as two helper functions Mboxtest() and JRsMahaldist(), and four example data sets.
198 Chemometrics and Computational Physics resemble Regression and Similarity Evaluation for Memory-Based Learning in Spectral Chemometrics Implementation of functions for spectral similarity/dissimilarity analysis and memory-based learning (MBL) for non-linear modeling in complex spectral datasets. In chemometrics MBL is also known as local modeling.
199 Chemometrics and Computational Physics RobPer Robust Periodogram and Periodicity Detection Methods Calculates periodograms based on (robustly) fitting periodic functions to light curves (irregularly observed time series, possibly with measurement accuracies, occurring in astroparticle physics). Three main functions are included: RobPer() calculates the periodogram. Outlying periodogram bars (indicating a period) can be detected with betaCvMfit(). Artificial light curves can be generated using the function tsgen(). For more details see the corresponding article: Thieler, Fried and Rathjens (2016), Journal of Statistical Software 69(9), 1-36, <doi:10.18637/jss.v069.i09>.
200 Chemometrics and Computational Physics rpubchem An Interface to the PubChem Collection Access PubChem data (compounds, substance, assays) using R. Structural information is provided in the form of SMILES strings. It currently only provides access to a subset of the precalculated data stored by PubChem. Bio-assay data can be accessed to obtain descriptions as well as the actual data. It is also possible to search for assay ID’s by keyword.
201 Chemometrics and Computational Physics sapa Spectral Analysis for Physical Applications Software for the book Spectral Analysis for Physical Applications, Donald B. Percival and Andrew T. Walden, Cambridge University Press, 1993.
202 Chemometrics and Computational Physics SCEPtER Stellar CharactEristics Pisa Estimation gRid SCEPtER pipeline for estimating the stellar age, mass, and radius given observational effective temperature, [Fe/H], and astroseismic parameters. The results are obtained adopting a maximum likelihood technique over a grid of pre-computed stellar models.
203 Chemometrics and Computational Physics SCEPtERbinary Stellar CharactEristics Pisa Estimation gRid for Binary Systems SCEPtER pipeline for estimating the stellar age for double-lined detached binary systems. The observational constraints adopted in the recovery are the effective temperature, the metallicity [Fe/H], the mass, and the radius of the two stars. The results are obtained adopting a maximum likelihood technique over a grid of pre-computed stellar models.
204 Chemometrics and Computational Physics simecol Simulation of Ecological (and Other) Dynamic Systems An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individual-based (or agent-based) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and re-usability of code.
205 Chemometrics and Computational Physics snapshot Gadget N-body cosmological simulation code snapshot I/O utilities Functions for reading and writing Gadget N-body snapshots. The Gadget code is popular in astronomy for running N-body / hydrodynamical cosmological and merger simulations. To find out more about Gadget see the main distribution page at www.mpa-garching.mpg.de/gadget/
206 Chemometrics and Computational Physics solaR Radiation and Photovoltaic Systems Calculation methods of solar radiation and performance of photovoltaic systems from daily and intradaily irradiation data sources.
207 Chemometrics and Computational Physics som Self-Organizing Map Self-Organizing Map (with application in gene clustering).
208 Chemometrics and Computational Physics SPADAR Spherical Projections of Astronomical Data Provides easy to use functions to create all-sky grid plots of widely used astronomical coordinate systems (equatorial, ecliptic, galactic) and scatter plots of data on any of these systems including on-the-fly system conversion. It supports any type of spherical projection to the plane defined by the ‘mapproj’ package.
209 Chemometrics and Computational Physics speaq Tools for Nuclear Magnetic Resonance (NMR) Spectra Alignment, Peak Based Processing, Quantitative Analysis and Visualizations Makes Nuclear Magnetic Resonance spectroscopy (NMR spectroscopy) data analysis as easy as possible by only requiring a small set of functions to perform an entire analysis. ‘speaq’ offers the possibility of raw spectra alignment and quantitation but also an analysis based on features whereby the spectra are converted to peaks which are then grouped and turned into features. These features can be processed with any number of statistical tools either included in ‘speaq’ or available elsewhere on CRAN. More details can be found in Vu et al. (2011) <doi:10.1186/1471-2105-12-405> and Beirnaert et al. (2018) <doi:10.1371/journal.pcbi.1006018>.
210 Chemometrics and Computational Physics spectralAnalysis Pre-Process, Visualize and Analyse Process Analytical Data, by Spectral Data Measurements Made During a Chemical Process Infrared, near-infrared and Raman spectroscopic data measured during chemical reactions, provide structural fingerprints by which molecules can be identified and quantified. The application of these spectroscopic techniques as inline process analytical tools (PAT), provides the (pharma-)chemical industry with novel tools, allowing to monitor their chemical processes, resulting in a better process understanding through insight in reaction rates, mechanistics, stability, etc. Data can be read into R via the generic spc-format, which is generally supported by spectrometer vendor software. Versatile pre-processing functions are available to perform baseline correction by linking to the ‘baseline’ package; noise reduction via the ‘signal’ package; as well as time alignment, normalization, differentiation, integration and interpolation. Implementation based on the S4 object system allows storing a pre-processing pipeline as part of a spectral data object, and easily transferring it to other datasets. Interactive plotting tools are provided based on the ‘plotly’ package. Non-negative matrix factorization (NMF) has been implemented to perform multivariate analyses on individual spectral datasets or on multiple datasets at once. NMF provides a parts-based representation of the spectral data in terms of spectral signatures of the chemical compounds and their relative proportions. The functionality to read in spc-files was adapted from the ‘hyperSpec’ package.
211 Chemometrics and Computational Physics spls Sparse Partial Least Squares (SPLS) Regression and Classification Provides functions for fitting a sparse partial least squares (SPLS) regression and classification (Chun and Keles (2010) <doi:10.1111/j.1467-9868.2009.00723.x>).
212 Chemometrics and Computational Physics stellaR stellar evolution tracks and isochrones A package to manage and display stellar tracks and isochrones from Pisa low-mass database. Includes tools for isochrones construction and tracks interpolation.
213 Chemometrics and Computational Physics stepPlr L2 Penalized Logistic Regression with Stepwise Variable Selection L2 penalized logistic regression for both continuous and discrete predictors, with forward stagewise/forward stepwise variable selection procedure.
214 Chemometrics and Computational Physics subselect Selecting Variable Subsets A collection of functions which (i) assess the quality of variable subsets as surrogates for a full data set, in either an exploratory data analysis or in the context of a multivariate linear model, and (ii) search for subsets which are optimal under various criteria.
215 Chemometrics and Computational Physics TIMP Fitting Separable Nonlinear Models in Spectroscopy and Microscopy A problem-solving environment (PSE) for fitting separable nonlinear models to measurements arising in physics and chemistry experiments; has been extensively applied to time-resolved spectroscopy and FLIM-FRET data.
216 Chemometrics and Computational Physics titan Titration analysis for mass spectrometry data GUI to analyze mass spectrometric data on the relative abundance of two substances from a titration series.
217 Chemometrics and Computational Physics titrationCurves Acid/Base, Complexation, Redox, and Precipitation Titration Curves A collection of functions to plot acid/base titration curves (pH vs. volume of titrant), complexation titration curves (pMetal vs. volume of EDTA), redox titration curves (potential vs.volume of titrant), and precipitation titration curves (either pAnalyte or pTitrant vs. volume of titrant). Options include the titration of mixtures, the ability to overlay two or more titration curves, and the ability to show equivalence points.
218 Chemometrics and Computational Physics units Measurement Units for R Vectors Support for measurement units in R vectors, matrices and arrays: automatic propagation, conversion, derivation and simplification of units; raising errors in case of unit incompatibility. Compatible with the POSIXct, Date and difftime classes. Uses the UNIDATA udunits library and unit database for unit compatibility checking and conversion.
219 Chemometrics and Computational Physics UPMASK Unsupervised Photometric Membership Assignment in Stellar Clusters An implementation of the UPMASK method for performing membership assignment in stellar clusters in R. It is prepared to use photometry and spatial positions, but it can take into account other types of data. The method is able to take into account arbitrary error models, and it is unsupervised, data-driven, physical-model-free and relies on as few assumptions as possible. The approach followed for membership assessment is based on an iterative process, dimensionality reduction, a clustering algorithm and a kernel density estimation.
220 Chemometrics and Computational Physics varSelRF Variable Selection using Random Forests Variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). Main applications in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).
221 Chemometrics and Computational Physics webchem Chemical Information from the Web Chemical information from around the web. This package interacts with a suite of web APIs for chemical information.
222 Chemometrics and Computational Physics WilcoxCV Wilcoxon-based variable selection in cross-validation This package provides functions to perform fast variable selection based on the Wilcoxon rank sum test in the cross-validation or Monte-Carlo cross-validation settings, for use in microarray-based binary classification.
223 Clinical Trial Design, Monitoring, and Analysis adaptTest (core) Adaptive two-stage tests The functions defined in this program serve for implementing adaptive two-stage tests. Currently, four tests are included: Bauer and Koehne (1994), Lehmacher and Wassmer (1999), Vandemeulebroecke (2006), and the horizontal conditional error function. User-defined tests can also be implemented. Reference: Vandemeulebroecke, An investigation of two-stage tests, Statistica Sinica 2006.
224 Clinical Trial Design, Monitoring, and Analysis AGSDest Estimation in Adaptive Group Sequential Trials Calculation of repeated confidence intervals as well as confidence intervals based on the stage-wise ordering in group sequential designs and adaptive group sequential designs. For adaptive group sequential designs the confidence intervals are based on the conditional rejection probability principle. Currently the procedures do not support the use of futility boundaries or more than one adaptive interim analysis.
225 Clinical Trial Design, Monitoring, and Analysis asd (core) Simulations for Adaptive Seamless Designs Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.
226 Clinical Trial Design, Monitoring, and Analysis asypow Calculate Power Utilizing Asymptotic Likelihood Ratio Methods A set of routines written in the S language that calculate power and related quantities utilizing asymptotic likelihood ratio methods.
227 Clinical Trial Design, Monitoring, and Analysis bcrm (core) Bayesian Continual Reassessment Method for Phase I Dose-Escalation Trials Implements a wide variety of one and two-parameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics.
228 Clinical Trial Design, Monitoring, and Analysis binomSamSize Confidence Intervals and Sample Size Determination for a Binomial Proportion under Simple Random Sampling and Pooled Sampling A suite of functions to compute confidence intervals and necessary sample sizes for the parameter p of the Bernoulli B(p) distribution under simple random sampling or under pooled sampling. Such computations are e.g. of interest when investigating the incidence or prevalence in populations. The package contains functions to compute coverage probabilities and coverage coefficients of the provided confidence intervals procedures. Sample size calculations are based on expected length.
229 Clinical Trial Design, Monitoring, and Analysis blockrand (core) Randomization for block random clinical trials Create randomizations for block random clinical trials. Can also produce a pdf file of randomization cards.
230 Clinical Trial Design, Monitoring, and Analysis clinfun (core) Clinical Trial Design and Data Analysis Functions Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.
231 Clinical Trial Design, Monitoring, and Analysis clinsig Clinical Significance Functions Functions for calculating clinical significance.
232 Clinical Trial Design, Monitoring, and Analysis clusterPower Power Calculations for Cluster-Randomized and Cluster-Randomized Crossover Trials Calculate power for cluster randomized trials (CRTs) that compare two means, two proportions, or two counts using closed-form solutions. In addition, calculate power for cluster randomized crossover trials using Monte Carlo methods. For more information, see Reich et al. (2012) <doi:10.1371/journal.pone.0035564>.
233 Clinical Trial Design, Monitoring, and Analysis coin Conditional Inference Procedures in a Permutation Test Framework Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems.
234 Clinical Trial Design, Monitoring, and Analysis conf.design Construction of factorial designs This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.
235 Clinical Trial Design, Monitoring, and Analysis CRM Continual Reassessment Method (CRM) for Phase I Clinical Trials Functions for phase I clinical trials using the continual reassessment method.
236 Clinical Trial Design, Monitoring, and Analysis CRTSize (core) Sample Size Estimation Functions for Cluster Randomized Trials Sample size estimation in cluster (group) randomized trials. Contains traditional power-based methods, empirical smoothing (Rotondi and Donner, 2009), and updated meta-analysis techniques (Rotondi and Donner, 2012).
237 Clinical Trial Design, Monitoring, and Analysis dfcrm (core) Dose-Finding by the Continual Reassessment Method Provides functions to run the CRM and TITE-CRM in phase I trials and calibration tools for trial planning purposes.
238 Clinical Trial Design, Monitoring, and Analysis dfped Extrapolation and Bridging of Adult Information in Early Phase Dose-Finding Paediatrics Studies A unified method for designing and analysing dose-finding trials in paediatrics, while bridging information from adults, is proposed in the ‘dfped’ package. The dose range can be calculated under three extrapolation methods: linear, allometry and maturation adjustment, using pharmacokinetic (PK) data. To do this, it is assumed that target exposures are the same in both populations. The working model and prior distribution parameters of the dose-toxicity and dose-efficacy relationships can be obtained using early phase adult toxicity and efficacy data at several dose levels through ‘dfped’ package. Priors are used into the dose finding process through a Bayesian model selection or adaptive priors, to facilitate adjusting the amount of prior information to differences between adults and children. This calibrates the model to adjust for misspecification if the adult and paediatric data are very different. User can use his/her own Bayesian model written in Stan code through the ‘dfped’ package. A template of this model is proposed in the examples of the corresponding R functions in the package. Finally, in this package you can find a simulation function for one trial or for more than one trial. These methods are proposed by Petit et al, (2016) <doi:10.1177/0962280216671348>.
239 Clinical Trial Design, Monitoring, and Analysis dfpk Bayesian Dose-Finding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods, proposed by Ursino et al, (2017) <doi:10.1002/bimj.201600084>, enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).
240 Clinical Trial Design, Monitoring, and Analysis DoseFinding Planning and Analyzing Dose Finding Experiments The DoseFinding package provides functions for the design and analysis of dose-finding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting non-linear dose-response models (using Bayesian and non-Bayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.
241 Clinical Trial Design, Monitoring, and Analysis epibasix Elementary Epidemiological Functions for Epidemiology and Biostatistics Contains elementary tools for analysis of common epidemiological problems, ranging from sample size estimation, through 2x2 contingency table analysis and basic measures of agreement (kappa, sensitivity/specificity). Appropriate print and summary statements are also written to facilitate interpretation wherever possible. Source code is commented throughout to facilitate modification. The target audience includes advanced undergraduate and graduate students in epidemiology or biostatistics courses, and clinical researchers.
242 Clinical Trial Design, Monitoring, and Analysis ewoc Escalation with Overdose Control An implementation of a variety of escalation with overdose control designs introduced by Babb, Rogatko and Zacks (1998) <doi:10.1002/(SICI)1097-0258(19980530)17:10%3C1103::AID-SIM793%3E3.0.CO;2-9>. It calculates the next dose as a clinical trial proceeds as well as performs simulations to obtain operating characteristics.
243 Clinical Trial Design, Monitoring, and Analysis experiment (core) R Package for Designing and Analyzing Randomized Experiments Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomized-block and matched-pair designs based on possibly multivariate pre-treatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, two-stage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.
244 Clinical Trial Design, Monitoring, and Analysis FrF2 Fractional Factorial Designs with 2-Level Factors Regular and non-regular Fractional Factorial 2-level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2-level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the built-in function alias).
245 Clinical Trial Design, Monitoring, and Analysis GroupSeq (core) A GUI-Based Program to Compute Probabilities Regarding Group Sequential Designs A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by Lan-DeMets with various alpha spending functions being available to choose among.
246 Clinical Trial Design, Monitoring, and Analysis gsbDesign Group Sequential Bayes Design Group Sequential Operating Characteristics for Clinical, Bayesian two-arm Trials with known Sigma and Normal Endpoints.
247 Clinical Trial Design, Monitoring, and Analysis gsDesign (core) Group Sequential Design Derives group sequential designs and describes their properties.
248 Clinical Trial Design, Monitoring, and Analysis HH Statistical Analysis and Data Display: Heiberger and Holland Support software for Statistical Analysis and Data Display (Second Edition, Springer, ISBN 978-1-4939-2121-8, 2015) and (First Edition, Springer, ISBN 0-387-40270-5, 2004) by Richard M. Heiberger and Burt Holland. This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The second edition includes redesigned graphics and additional chapters. The authors emphasize how to construct and interpret graphs, discuss principles of graphical design, and show how accompanying traditional tabular results are used to confirm the visual impressions derived directly from the graphs. Many of the graphical formats are novel and appear here for the first time in print. All chapters have exercises. All functions introduced in the book are in the package. R code for all examples, both graphs and tables, in the book is included in the scripts directory of the package.
249 Clinical Trial Design, Monitoring, and Analysis Hmisc (core) Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
250 Clinical Trial Design, Monitoring, and Analysis InformativeCensoring Multiple Imputation for Informative Censoring Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation from Jackson et al. (2014) <doi:10.1002/sim.6274> and Risk Score Imputation from Hsu et al. (2009) <doi:10.1002/sim.3480>.
251 Clinical Trial Design, Monitoring, and Analysis ldbounds (core) Lan-DeMets Method for Group Sequential Boundaries Computations related to group sequential boundaries. Includes calculation of bounds using the Lan-DeMets alpha spending function approach.
252 Clinical Trial Design, Monitoring, and Analysis MCPMod Design and Analysis of Dose-Finding Studies Implements a methodology for the design and analysis of dose-response studies that combines aspects of multiple comparison procedures and modeling approaches (Bretz, Pinheiro and Branson, 2005, Biometrics 61, 738-748, <doi:10.1111/j.1541-0420.2005.00344.x>). The package provides tools for the analysis of dose finding trials as well as a variety of tools necessary to plan a trial to be conducted with the MCP-Mod methodology. Please note: The ‘MCPMod’ package will not be further developed, all future development of the MCP-Mod methodology will be done in the ‘DoseFinding’ R-package.
253 Clinical Trial Design, Monitoring, and Analysis Mediana Clinical Trial Simulations Provides a general framework for clinical trial simulations based on the Clinical Scenario Evaluation (CSE) approach. The package supports a broad class of data models (including clinical trials with continuous, binary, survival-type and count-type endpoints as well as multivariate outcomes that are based on combinations of different endpoints), analysis strategies and commonly used evaluation criteria.
254 Clinical Trial Design, Monitoring, and Analysis meta General Package for Meta-Analysis User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rucker <doi:10.1007/978-3-319-21416-0>, “Meta-Analysis with R” (2015): - fixed effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L’Abbe, Baujat, bubble); - statistical tests and trim-and-fill method to evaluate bias in meta-analysis; - import data from ‘RevMan 5’; - prediction interval, Hartung-Knapp and Paule-Mandel method for random effects model; - cumulative meta-analysis and leave-one-out meta-analysis; - meta-regression; - generalised linear mixed models; - produce forest plot summarising several (subgroup) meta-analyses.
255 Clinical Trial Design, Monitoring, and Analysis metafor Meta-Analysis Package for R A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L’Abbe, Baujat, GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto’s method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted.
256 Clinical Trial Design, Monitoring, and Analysis metaLik Likelihood Inference in Meta-Analysis and Meta-Regression Models First- and higher-order likelihood inference in meta-analysis and meta-regression models.
257 Clinical Trial Design, Monitoring, and Analysis metasens Advanced Statistical Methods to Model and Adjust for Bias in Meta-Analysis The following methods are implemented to evaluate how sensitive the results of a meta-analysis are to potential bias in meta-analysis and to support Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 5 “Small-Study Effects in Meta-Analysis”: - Copas selection model described in Copas & Shi (2001) <doi:10.1177/096228020101000402>; - limit meta-analysis by Rucker et al. (2011) <doi:10.1093/biostatistics/kxq046>; - upper bound for outcome reporting bias by Copas & Jackson (2004) <doi:10.1111/j.0006-341X.2004.00161.x>.
258 Clinical Trial Design, Monitoring, and Analysis multcomp Simultaneous Inference in General Parametric Models Simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects, and survival models. The package includes demos reproducing analyzes presented in the book “Multiple Comparisons Using R” (Bretz, Hothorn, Westfall, 2010, CRC Press).
259 Clinical Trial Design, Monitoring, and Analysis nppbib Nonparametric Partially-Balanced Incomplete Block Design Analysis Implements a nonparametric statistical test for rank or score data from partially-balanced incomplete block-design experiments.
260 Clinical Trial Design, Monitoring, and Analysis PIPS (core) Predicted Interval Plots Generate Predicted Interval Plots. Simulate and plot confidence intervals of an effect estimate given observed data and a hypothesis about the distribution of future data.
261 Clinical Trial Design, Monitoring, and Analysis PowerTOST (core) Power and Sample Size Based on Two One-Sided t-Tests (TOST) for (Bio)Equivalence Studies Contains functions to calculate power and sample size for various study designs used for bioequivalence studies. See function known.designs() for study designs covered. Moreover the package contains functions for power and sample size based on ‘expected’ power in case of uncertain (estimated) variability and/or uncertain theta0. ― Added are functions for the power and sample size for the ratio of two means with normally distributed data on the original scale (based on Fieller’s confidence (‘fiducial’) interval). ― Contains further functions for power and sample size calculations based on non-inferiority t-test. This is not a TOST procedure but eventually useful if the question of ‘non-superiority’ must be evaluated. The power and sample size calculations based on non-inferiority test may also performed via ‘expected’ power in case of uncertain (estimated) variability and/or uncertain theta0. ― Contains functions power.scABEL() and sampleN.scABEL() to calculate power and sample size for the BE decision via scaled (widened) BE acceptance limits (EMA recommended) based on simulations. Contains also functions scABEL.ad() and sampleN.scABEL.ad() to iteratively adjust alpha in order to maintain the overall consumer risk in ABEL studies and adapt the sample size for the loss in power. Contains further functions power.RSABE() and sampleN.RSABE() to calculate power and sample size for the BE decision via reference scaled ABE criterion according to the FDA procedure based on simulations. Contains further functions power.NTIDFDA() and sampleN.NTIDFDA() to calculate power and sample size for the BE decision via the FDA procedure for NTID’s based on simulations. Contains further functions power.HVNTID() and sampleN.HVNTID() to calculate power and sample size for the BE decision via the FDA procedure for highly variable NTID’s (see FDA Dabigatran / rivaroxaban guidances) ― Contains functions for power analysis of a sample size plan for ABE (pa.ABE()), scaled ABE (pa.scABE()) and scaled ABE for NTID’s (pa.NTIDFDA()) analysing power if deviating from assumptions of the plan. ― Contains further functions for power calculations / sample size estimation for dose proportionality studies using the Power model.
262 Clinical Trial Design, Monitoring, and Analysis pwr (core) Basic Functions for Power Analysis Power analysis functions along the lines of Cohen (1988).
263 Clinical Trial Design, Monitoring, and Analysis PwrGSD (core) Power in a Group Sequential Design Tools for the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.
264 Clinical Trial Design, Monitoring, and Analysis qtlDesign (core) Design of QTL experiments Tools for the design of QTL experiments
265 Clinical Trial Design, Monitoring, and Analysis rmeta Meta-Analysis Functions for simple fixed and random effects meta-analysis for two-sample comparisons and cumulative meta-analyses. Draws standard summary plots, funnel plots, and computes summaries and tests for association and heterogeneity.
266 Clinical Trial Design, Monitoring, and Analysis samplesize Sample Size Calculation for Various t-Tests and Wilcoxon-Test Computes sample size for Student’s t-test and for the Wilcoxon-Mann-Whitney test for categorical data. The t-test function allows paired and unpaired (balanced / unbalanced) designs as well as homogeneous and heterogeneous variances. The Wilcoxon function allows for ties.
267 Clinical Trial Design, Monitoring, and Analysis speff2trial (core) Semiparametric efficient estimation for a two-sample treatment effect The package performs estimation and testing of the treatment effect in a 2-group randomized clinical trial with a quantitative, dichotomous, or right-censored time-to-event endpoint. The method improves efficiency by leveraging baseline predictors of the endpoint. The inverse probability weighting technique of Robins, Rotnitzky, and Zhao (JASA, 1994) is used to provide unbiased estimation when the endpoint is missing at random.
268 Clinical Trial Design, Monitoring, and Analysis ssanv Sample Size Adjusted for Nonadherence or Variability of Input Parameters A set of functions to calculate sample size for two-sample difference in means tests. Does adjustments for either nonadherence or variability that comes from using data to estimate parameters.
269 Clinical Trial Design, Monitoring, and Analysis survival (core) Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
270 Clinical Trial Design, Monitoring, and Analysis TEQR (core) Target Equivalence Range Design The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is de-escalated only if an unacceptable level of toxicity is experienced.
271 Clinical Trial Design, Monitoring, and Analysis ThreeArmedTrials Design and Analysis of Clinical Non-Inferiority or Superiority Trials with Active and Placebo Control Design and analyze three-arm non-inferiority or superiority trials which follow a gold-standard design, i.e. trials with an experimental treatment, an active, and a placebo control. Method for the following distributions are implemented: Poisson (Mielke and Munk (2009) <arXiv:0912.4169>), negative binomial (Muetze et al. (2016) <doi:10.1002/sim.6738>), normal (Pigeot et al. (2003) <doi:10.1002/sim.1450>; Hasler et al. (2009) <doi:10.1002/sim.3052>), binary (Friede and Kieser (2007) <doi:10.1002/sim.2543>), nonparametric (Muetze et al. (2017) <doi:10.1002/sim.7176>), exponential (Mielke and Munk (2009) <arXiv:0912.4169>).
272 Clinical Trial Design, Monitoring, and Analysis ThreeGroups ML Estimator for Baseline-Placebo-Treatment (Three-Group) Experiments Implements the Maximum Likelihood estimator for baseline, placebo, and treatment groups (three-group) experiments with non-compliance proposed by Gerber, Green, Kaplan, and Kern (2010).
273 Clinical Trial Design, Monitoring, and Analysis TrialSize (core) R functions in Chapter 3,4,6,7,9,10,11,12,14,15 functions and examples in Sample Size Calculation in Clinical Research.
274 Cluster Analysis & Finite Mixture Models AdMit Adaptive Mixture of Student-t Distributions Provides functions to perform the fitting of an adaptive mixture of Student-t distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the Metropolis-Hastings algorithm to obtain quantities of interest for the target density itself.
275 Cluster Analysis & Finite Mixture Models ADPclust Fast Clustering Using Adaptive Density Peak Detection An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon the idea of Rodriguez and Laio (2014)<doi:10.1126/science.1242072>. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes a user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional “density-distance plot”. Here is the research article associated with this package: “Wang, Xiao-Feng, and Yifan Xu (2015)<doi:10.1177/0962280215609948> Fast clustering using adaptive density peak detection.” Statistical methods in medical research“. url: http://smm.sagepub.com/content/early/2015/10/15/0962280215609948.abstract.
276 Cluster Analysis & Finite Mixture Models amap Another Multidimensional Analysis Package Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).
277 Cluster Analysis & Finite Mixture Models apcluster Affinity Propagation Clustering Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) <doi:10.1126/science.1136800>. The algorithms are largely analogous to the ‘Matlab’ code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplar-based agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results.
278 Cluster Analysis & Finite Mixture Models BayesLCA Bayesian Latent Class Analysis Bayesian Latent Class Analysis using several different methods.
279 Cluster Analysis & Finite Mixture Models bayesm Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
280 Cluster Analysis & Finite Mixture Models bayesmix Bayesian Mixture Models with JAGS The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.
281 Cluster Analysis & Finite Mixture Models bclust Bayesian Hierarchical Clustering Using Spike and Slab Models Builds a dendrogram using log posterior as a natural distance defined by the model and meanwhile waits the clustering variables. It is also capable to computing equivalent Bayesian discrimination probabilities. The adopted method suites small sample large dimension setting. The model parameter estimation maybe difficult, depending on data structure and the chosen distribution family.
282 Cluster Analysis & Finite Mixture Models bgmm Gaussian Mixture Modeling Algorithms and the Belief-Based Mixture Modeling Two partially supervised mixture modeling methods: soft-label and belief-based modeling are implemented. For completeness, we equipped the package also with the functionality of unsupervised, semi- and fully supervised mixture modeling. The package can be applied also to selection of the best-fitting from a set of models with different component numbers or constraints on their structures. For detailed introduction see: Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software <doi:10.18637/jss.v047.i03>.
283 Cluster Analysis & Finite Mixture Models biclust BiCluster Algorithms The main function biclust() provides several algorithms to find biclusters in two-dimensional data: Cheng and Church (2000, ISBN:1-57735-115-0), spectral (2003) <doi:10.1101/gr.648603>, plaid model (2005) <doi:10.1016/j.csda.2004.02.003>, xmotifs (2003) <doi:10.1142/9789812776303_0008> and bimax (2006) <doi:10.1093/bioinformatics/btl060>. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.
284 Cluster Analysis & Finite Mixture Models Bmix Bayesian Sampling for Stick-Breaking Mixtures This is a bare-bones implementation of sampling algorithms for a variety of Bayesian stick-breaking (marginally DP) mixture models, including particle learning and Gibbs sampling for static DP mixtures, particle learning for dynamic BAR stick-breaking, and DP mixture regression. The software is designed to be easy to customize to suit different situations and for experimentation with stick-breaking models. Since particles are repeatedly copied, it is not an especially efficient implementation.
285 Cluster Analysis & Finite Mixture Models bmixture Bayesian Estimation for Finite Mixture of Distributions Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s00180-012-0323-3> and Mohammadi and Salehi-Rad (2012) <doi:10.1080/03610918.2011.588358>.
286 Cluster Analysis & Finite Mixture Models cba Clustering for Business Analytics Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.
287 Cluster Analysis & Finite Mixture Models cclust Convex Clustering Methods and Clustering Indexes Convex Clustering methods, including K-means algorithm, On-line Update algorithm (Hard Competitive Learning) and Neural Gas algorithm (Soft Competitive Learning), and calculation of several indexes for finding the number of clusters in a data set.
288 Cluster Analysis & Finite Mixture Models CEC Cross-Entropy Clustering CEC divides data into Gaussian type clusters. The implementation allows the simultaneous use of various type Gaussian mixture models, performs the reduction of unnecessary clusters and it’s able to discover new groups. Based on Spurek, P. and Tabor, J. (2014) <doi:10.1016/j.patcog.2014.03.006>.
289 Cluster Analysis & Finite Mixture Models CHsharp Choi and Hall Style Data Sharpening Functions for use in perturbing data prior to use of nonparametric smoothers and clustering.
290 Cluster Analysis & Finite Mixture Models clue Cluster Ensembles CLUster Ensembles.
291 Cluster Analysis & Finite Mixture Models cluster (core) “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.
292 Cluster Analysis & Finite Mixture Models clusterCrit Clustering Indices Compute clustering validation indices.
293 Cluster Analysis & Finite Mixture Models clusterfly Explore clustering interactively using R and GGobi Visualise clustering algorithms with GGobi. Contains both general code for visualising clustering results and specific visualisations for model-based, hierarchical and SOM clustering.
294 Cluster Analysis & Finite Mixture Models clusterGeneration Random Cluster Generation (with Specified Degree of Separation) We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1-D and 2-D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.
295 Cluster Analysis & Finite Mixture Models ClusterR Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of ‘RcppArmadillo’ to speed up the computationally intensive parts of the functions. For more information, see (i) “Clustering in an Object-Oriented Environment” by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) “Web-scale k-means clustering” by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) “Armadillo: a template-based C++ library for linear algebra” by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) “Clustering by Passing Messages Between Data Points” by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.
296 Cluster Analysis & Finite Mixture Models clusterRepro Reproducibility of Gene Expression Clusters This is a function for validating microarray clusters via reproducibility, based on the paper referenced below.
297 Cluster Analysis & Finite Mixture Models clusterSim Searching for Optimal Clustering Procedure for a Data Set Distance measures (GDM1, GDM2, Sokal-Michener, Bray-Curtis, for symbolic interval-valued data), cluster quality indices (Calinski-Harabasz, Baker-Hubert, Hubert-Levine, Silhouette, Krzanowski-Lai, Hartigan, Gap, Davies-Bouldin), data normalization formulas (metric data, interval-valued symbolic data), data generation (typical and non-typical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic interval-valued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/978-3-642-57280-7_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/978-3-642-55721-7_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/1467-9868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/978-3-540-78246-9_11>).
298 Cluster Analysis & Finite Mixture Models clustMixType k-Prototypes Clustering for Mixed Variable-Type Data Functions to perform k-prototypes partitioning clustering for mixed variable-type data according to Z.Huang (1998): Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283-304, <doi:10.1023/A:1009769707641>.
299 Cluster Analysis & Finite Mixture Models clustvarsel Variable Selection for Gaussian Model-Based Clustering Variable selection for Gaussian model-based clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forward-backward or backward-forward direction, with or without sub-sampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.
300 Cluster Analysis & Finite Mixture Models clv Cluster Validation Techniques Package contains most of the popular internal and external cluster validation methods ready to use for the most of the outputs produced by functions coming from package “cluster”. Package contains also functions and examples of usage for cluster stability approach that might be applied to algorithms implemented in “cluster” package as well as user defined clustering algorithms.
301 Cluster Analysis & Finite Mixture Models clValid Validation of Clustering Results Statistical and biological validation of clustering results.
302 Cluster Analysis & Finite Mixture Models CoClust Copula Based Cluster Analysis A copula based clustering algorithm that finds clusters according to the complex multivariate dependence structure of the data generating process. The updated version of the algorithm is described in Di Lascio, F.M.L. and Giannerini, S. (2016). “Clustering dependent observations with copula functions”. Statistical Papers, p.1-17. <doi:10.1007/s00362-016-0822-3>.
303 Cluster Analysis & Finite Mixture Models compHclust Complementary Hierarchical Clustering Performs the complementary hierarchical clustering procedure and returns X’ (the expected residual matrix) and a vector of the relative gene importances.
304 Cluster Analysis & Finite Mixture Models dbscan Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms A fast reimplementation of several density-based algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (density-based spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms HDBSCAN (hierarchical DBSCAN) and the LOF (local outlier factor) algorithm. The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.
305 Cluster Analysis & Finite Mixture Models dendextend Extending ‘dendrogram’ Functionality in R Offers a set of functions for extending ‘dendrogram’ objects in R, letting you visualize and compare trees of ‘hierarchical clusterings’. You can (1) Adjust a tree’s graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different ‘dendrograms’ to one another.
306 Cluster Analysis & Finite Mixture Models depmix Dependent Mixture Models Fits (multigroup) mixtures of latent or hidden Markov models on mixed categorical and continuous (timeseries) data. The Rdonlp2 package can optionally be used for optimization of the log-likelihood and is available from R-forge.
307 Cluster Analysis & Finite Mixture Models depmixS4 Dependent Mixture Models - Hidden Markov Models of GLMs and Other Distributions in S4 Fits latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models.
308 Cluster Analysis & Finite Mixture Models dpmixsim Dirichlet Process Mixture Model Simulation for Clustering and Image Segmentation The ‘dpmixsim’ package implements a Dirichlet Process Mixture (DPM) model for clustering and image segmentation. The DPM model is a Bayesian nonparametric methodology that relies on MCMC simulations for exploring mixture models with an unknown number of components. The code implements conjugate models with normal structure (conjugate normal-normal DP mixture model). The package’s applications are oriented towards the classification of magnetic resonance images according to tissue type or region of interest.
309 Cluster Analysis & Finite Mixture Models dynamicTreeCut Methods for Detection of Clusters in Hierarchical Clustering Dendrograms Contains methods for detection of clusters in hierarchical clustering dendrograms.
310 Cluster Analysis & Finite Mixture Models e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
311 Cluster Analysis & Finite Mixture Models edci Edge Detection and Clustering in Images Detection of edge points in images based on the difference of two asymmetric M-kernel estimators. Linear and circular regression clustering based on redescending M-estimators. Detection of linear edges in images.
312 Cluster Analysis & Finite Mixture Models EMCluster EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution EM algorithms and several efficient initialization methods for model-based clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised learning.
313 Cluster Analysis & Finite Mixture Models evclust Evidential Clustering Various clustering algorithms that produce a credal partition, i.e., a set of Dempster-Shafer mass functions representing the membership of objects to clusters. The mass functions quantify the cluster-membership uncertainty of the objects. The algorithms are: Evidential c-Means (ECM), Relational Evidential c-Means (RECM), Constrained Evidential c-Means (CECM), EVCLUS and EK-NNclus.
314 Cluster Analysis & Finite Mixture Models FactoClass Combination of Factorial Methods and Cluster Analysis Some functions of ‘ade4’ and ‘stats’ are combined in order to obtain a partition of the rows of a data table, with columns representing variables of scales: quantitative, qualitative or frequency. First, a principal axes method is performed and then, a combination of Ward agglomerative hierarchical classification and K-means is performed, using some of the first coordinates obtained from the previous principal axes method. See, for example: Lebart, L. and Piron, M. and Morineau, A. (2006). Statistique Exploratoire Multidimensionnelle, Dunod, Paris. In order to permit to have different weights of the elements to be clustered, the function ‘kmeansW’, programmed in C++, is included. It is a modification of ‘kmeans’. Some graphical functions include the option: ‘gg=FALSE’. When ‘gg=TRUE’, they use the ‘ggplot2’ and ‘ggrepel’ packages to avoid the super-position of the labels.
315 Cluster Analysis & Finite Mixture Models fastcluster Fast Hierarchical Clustering Routines for R and ‘Python’ This is a two-in-one package which provides interfaces to both R and ‘Python’. It implements fast hierarchical, agglomerative clustering routines. Part of the functionality is designed as drop-in replacement for existing routines: linkage() in the ‘SciPy’ package ‘scipy.cluster.hierarchy’, hclust() in R’s ‘stats’ package, and the ‘flashClust’ package. It provides the same functionality with the benefit of a much faster implementation. Moreover, there are memory-saving routines for clustering of vector data, which go beyond what the existing packages provide. For information on how to install the ‘Python’ files, see the file INSTALL in the source distribution. Based on the present package, Christoph Dalitz also wrote a pure ‘C++’ interface to ‘fastcluster’: <http://informatik.hsnr.de/~dalitz/data/hclust>.
316 Cluster Analysis & Finite Mixture Models fclust Fuzzy Clustering Algorithms for fuzzy clustering, cluster validity indices and plots for cluster validity and visualizing fuzzy clustering results.
317 Cluster Analysis & Finite Mixture Models flashClust Implementation of optimal hierarchical clustering Fast implementation of hierarchical clustering
318 Cluster Analysis & Finite Mixture Models flexclust (core) Flexible Cluster Algorithms The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, …), and bootstrap methods for the analysis of cluster stability.
319 Cluster Analysis & Finite Mixture Models flexCWM Flexible Cluster-Weighted Modeling Allows for maximum likelihood fitting of cluster-weighted models, a class of mixtures of regression models with random covariates.
320 Cluster Analysis & Finite Mixture Models flexmix (core) Flexible Mixture Modeling A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
321 Cluster Analysis & Finite Mixture Models fpc Flexible Procedures for Clustering Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther’s prediction strength, Fang and Wang’s bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.
322 Cluster Analysis & Finite Mixture Models FunCluster Functional Profiling of Microarray Expression Data FunCluster performs a functional analysis of microarray expression data based on Gene Ontology & KEGG functional annotations. From expression data and functional annotations FunCluster builds classes of putatively co-regulated biological processes through a specially designed clustering procedure.
323 Cluster Analysis & Finite Mixture Models funFEM Clustering in the Discriminative Functional Subspace The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.
324 Cluster Analysis & Finite Mixture Models funHDDC Univariate and Multivariate Model-Based Clustering in Group-Specific Functional Subspaces The funHDDC algorithm allows to cluster functional univariate (Bouveyron and Jacques, 2011, <doi:10.1007/s11634-011-0095-6>) or multivariate data (Schmutz et al., 2018) by modeling each group within a specific functional subspace.
325 Cluster Analysis & Finite Mixture Models gamlss.mx Fitting Mixture Distributions with GAMLSS The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.
326 Cluster Analysis & Finite Mixture Models genie A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed, see (Gagolewski et al. 2016a <doi:10.1016/j.ins.2016.05.003>, 2016b <doi:10.1007/978-3-319-45656-0_16>) for more details.
327 Cluster Analysis & Finite Mixture Models GLDEX Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.
328 Cluster Analysis & Finite Mixture Models GMCM Fast Estimation of Gaussian Mixture Copula Models Unsupervised Clustering and Meta-analysis using Gaussian Mixture Copula Models.
329 Cluster Analysis & Finite Mixture Models GSM Gamma Shape Mixture Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavy-tailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.
330 Cluster Analysis & Finite Mixture Models HDclassif High Dimensional Supervised Classification and Clustering Discriminant analysis and data clustering methods for high dimensional data, based on the assumption that high-dimensional data live in different subspaces with low dimensionality proposing a new parametrization of the Gaussian mixture model which combines the ideas of dimension reduction and constraints on the model.
331 Cluster Analysis & Finite Mixture Models hybridHclust Hybrid Hierarchical Clustering Hybrid hierarchical clustering via mutual clusters. A mutual cluster is a set of points closer to each other than to all other points. Mutual clusters are used to enrich top-down hierarchical clustering.
332 Cluster Analysis & Finite Mixture Models idendr0 Interactive Dendrograms Interactive dendrogram that enables the user to select and color clusters, to zoom and pan the dendrogram, and to visualize the clustered data not only in a built-in heat map, but also in ‘GGobi’ interactive plots and user-supplied plots. This is a backport of Qt-based ‘idendro’ (<https://github.com/tsieger/idendro>) to base R graphics and Tcl/Tk GUI.
333 Cluster Analysis & Finite Mixture Models IMIFA Infinite Mixtures of Infinite Factor Analysers and Related Models Provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametrically clustering high-dimensional data, introduced by Murphy et al. (2018) <arXiv:1701.07010v4>. The IMIFA model conducts Bayesian nonparametric model-based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors, mostly via efficient Gibbs updates. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.
334 Cluster Analysis & Finite Mixture Models isopam Isopam (Clustering) Isopam clustering algorithm and utilities. Isopam optimizes clusters and optionally cluster numbers in a brute force style and aims at an optimum separation by all or some descriptors (typically species).
335 Cluster Analysis & Finite Mixture Models kernlab Kernel-Based Machine Learning Lab Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
336 Cluster Analysis & Finite Mixture Models kml K-Means for Longitudinal Data An implementation of k-means specifically design to cluster longitudinal data. It provides facilities to deal with missing value, compute several quality criterion (Calinski and Harabatz, Ray and Turie, Davies and Bouldin, BIC, …) and propose a graphical interface for choosing the ‘best’ number of clusters.
337 Cluster Analysis & Finite Mixture Models latentnet Latent Position and Cluster Models for Statistical Networks Fit and simulate latent position and cluster models for statistical networks.
338 Cluster Analysis & Finite Mixture Models LCAvarsel Variable Selection for Latent Class Analysis Variable selection for latent class analysis for model-based clustering of multivariate categorical data. The package implements a general framework for selecting the subset of variables with relevant clustering information and discard those that are redundant and/or not informative. The variable selection method is based on the approach of Fop et al. (2017) <doi:10.1214/17-AOAS1061> and Dean and Raftery (2010) <doi:10.1007/s10463-009-0258-9>. Different algorithms are available to perform the selection: stepwise, swap-stepwise and evolutionary stochastic search. Concomitant covariates used to predict the class membership probabilities can also be included in the latent class analysis model. The selection procedure can be run in parallel on multiple cores machines.
339 Cluster Analysis & Finite Mixture Models lcmm Extended Mixed Models Using Latent Classes and Latent Processes Estimation of various extensions of the mixed models including latent class mixed models, joint latent latent class mixed models and mixed models for curvilinear univariate or multivariate longitudinal outcomes using a maximum likelihood estimation method.
340 Cluster Analysis & Finite Mixture Models longclust Model-Based Clustering and Classification for Longitudinal Data Clustering or classification of longitudinal data based on a mixture of multivariate t or Gaussian distributions with a Cholesky-decomposed covariance structure.
341 Cluster Analysis & Finite Mixture Models mcclust Process an MCMC Sample of Clusterings Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model. Among them are methods that find a single best clustering to represent the sample, which are based on the posterior similarity matrix or a relabelling algorithm.
342 Cluster Analysis & Finite Mixture Models mclust (core) Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
343 Cluster Analysis & Finite Mixture Models MetabolAnalyze Probabilistic latent variable models for metabolomic data Fits probabilistic principal components analysis, probabilistic principal components and covariates analysis and mixtures of probabilistic principal components models to metabolomic spectral data.
344 Cluster Analysis & Finite Mixture Models mixAK Multivariate Normal Mixture Models and Mixtures of Generalized Linear Mixed Models Including Model Based Clustering Contains a mixture of statistical methods including the MCMC methods to analyze normal mixtures. Additionally, model based clustering methods are implemented to perform classification based on (multivariate) longitudinal (or otherwise correlated) data. The basis for such clustering is a mixture of multivariate generalized linear mixed models.
345 Cluster Analysis & Finite Mixture Models MixAll Clustering and Classification using Model-Based Mixture Models Algorithms and methods for model-based clustering and classification. It supports various types of data: continuous, categorical and counting and can handle mixed data of these types. It can fit Gaussian (with diagonal covariance structure), gamma, categorical and Poisson models. The algorithms also support missing values. This package can be used as an independent alternative to the (not free) ‘mixtcomp’ software available at <https://massiccc.lille.inria.fr/>.
346 Cluster Analysis & Finite Mixture Models mixdist Finite Mixture Distribution Models Fit finite mixture distribution models to grouped data and conditional data by maximum likelihood using a combination of a Newton-type algorithm and the EM algorithm.
347 Cluster Analysis & Finite Mixture Models mixPHM Mixtures of Proportional Hazard Models Fits multiple variable mixtures of various parametric proportional hazard models using the EM-Algorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing values and censored values are allowed. Independence is assumed over the single variables.
348 Cluster Analysis & Finite Mixture Models mixRasch Mixture Rasch Models with JMLE Estimates Rasch models and mixture Rasch models, including the dichotomous Rasch model, the rating scale model, and the partial credit model.
349 Cluster Analysis & Finite Mixture Models mixreg Functions to Fit Mixtures of Regressions Fits mixtures of (possibly multivariate) regressions (which has been described as doing ANCOVA when you don’t know the levels).
350 Cluster Analysis & Finite Mixture Models MixSim Simulating Data to Study Performance of Clustering Algorithms The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of ‘MixSim’, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.
351 Cluster Analysis & Finite Mixture Models mixsmsn Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions Functions to fit finite mixture of scale mixture of skew-normal (FM-SMSN) distributions.
352 Cluster Analysis & Finite Mixture Models mixtools Tools for Analyzing Finite Mixture Models Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
353 Cluster Analysis & Finite Mixture Models mixture Finite Gaussian Mixture Models for Clustering and Classification An implementation of all 14 Gaussian parsimonious clustering models (GPCMs) for model-based clustering and model-based classification.
354 Cluster Analysis & Finite Mixture Models MOCCA Multi-Objective Optimization for Collecting Cluster Alternatives Provides methods to analyze cluster alternatives based on multi-objective optimization of cluster validation indices. For details see Kraus et al. (2011) <doi:10.1007/s00180-011-0244-6>.
355 Cluster Analysis & Finite Mixture Models MoEClust Gaussian Parsimonious Clustering Models with Covariates Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2017) <arXiv:1711.05632>. This package fits finite Gaussian mixture models with a formula interface for supplying gating and/or expert network covariates using a range of parsimonious covariance parameterisations via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots is also facilitated.
356 Cluster Analysis & Finite Mixture Models movMF Mixtures of von Mises-Fisher Distributions Fit and simulate mixtures of von Mises-Fisher distributions.
357 Cluster Analysis & Finite Mixture Models mritc MRI Tissue Classification Various methods for MRI tissue classification.
358 Cluster Analysis & Finite Mixture Models NbClust Determining the Best Number of Clusters in a Data Set It provides 30 indexes for determining the optimal number of clusters in a data set and offers the best clustering scheme from different results to the user.
359 Cluster Analysis & Finite Mixture Models nor1mix Normal (1-d) Mixture Models (S3 Classes and Methods) Onedimensional Normal Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used Marron-Wand densities. Efficient random number generation and graphics; now fitting to data by ML (Maximum Likelihood) or EM estimation.
360 Cluster Analysis & Finite Mixture Models optpart Optimal Partitioning of Similarity Relations Contains a set of algorithms for creating partitions and coverings of objects largely based on operations on (dis)similarity relations (or matrices). There are several iterative re-assignment algorithms optimizing different goodness-of-clustering criteria. In addition, there are covering algorithms ‘clique’ which derives maximal cliques, and ‘maxpact’ which creates a covering of maximally compact sets. Graphical analyses and conversion routines are also included.
361 Cluster Analysis & Finite Mixture Models ORIClust Order-restricted Information Criterion-based Clustering Algorithm ORIClust is a user-friendly R-based software package for gene clustering. Clusters are given by genes matched to prespecified profiles across various ordered treatment groups. It is particularly useful for analyzing data obtained from short time-course or dose-response microarray experiments.
362 Cluster Analysis & Finite Mixture Models pdfCluster Cluster Analysis via Nonparametric Density Estimation Cluster analysis via nonparametric density estimation is performed. Operationally, the kernel method is used throughout to estimate the density. Diagnostics methods for evaluating the quality of the clustering are available. The package includes also a routine to estimate the probability density function obtained by the kernel method, given a set of data with arbitrary dimensions.
363 Cluster Analysis & Finite Mixture Models pmclust Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs ‘pbdMPI’ to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through ‘pbdMPI’ and MPI’ implementations such as ‘OpenMPI’ and ‘MPICH’. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.
364 Cluster Analysis & Finite Mixture Models poLCA Polytomous variable Latent Class Analysis Latent class analysis and latent class regression models for polytomous outcome variables. Also known as latent structure analysis.
365 Cluster Analysis & Finite Mixture Models prabclus Functions for Clustering of Presence-Absence, Abundance and Multilocus Genetic Data Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Try package?prabclus for on overview.
366 Cluster Analysis & Finite Mixture Models prcr Person-Centered Analysis Provides an easy-to-use yet adaptable set of tools to conduct person-center analysis using a two-step clustering procedure. As described in Bergman and El-Khouri (1999) <doi:10.1002/(SICI)1521-4036(199910)41:6%3C753::AID-BIMJ753%3E3.0.CO;2-K>, hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure.
367 Cluster Analysis & Finite Mixture Models PReMiuM Dirichlet Process Bayesian Clustering, Profile Regression Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.
368 Cluster Analysis & Finite Mixture Models profdpm Profile Dirichlet Process Mixtures This package facilitates profile inference (inference at the posterior mode) for a class of product partition models (PPM). The Dirichlet process mixture is currently the only available member of this class. These methods search for the maximum posterior (MAP) estimate for the data partition in a PPM.
369 Cluster Analysis & Finite Mixture Models protoclust Hierarchical Clustering with Prototypes Performs minimax linkage hierarchical clustering. Every cluster has an associated prototype element that represents that cluster as described in Bien, J., and Tibshirani, R. (2011), “Hierarchical Clustering with Prototypes via Minimax Linkage,” The Journal of the American Statistical Association, 106(495), 1075-1084.
370 Cluster Analysis & Finite Mixture Models psychomix Psychometric Mixture Models Psychometric mixture models based on ‘flexmix’ infrastructure. At the moment Rasch mixture models with different parameterizations of the score distribution (saturated vs. mean/variance specification), Bradley-Terry mixture models, and MPT mixture models are implemented. These mixture models can be estimated with or without concomitant variables. See vignette(‘raschmix’, package = ‘psychomix’) for details on the Rasch mixture models.
371 Cluster Analysis & Finite Mixture Models pvclust Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) p-value as well as BP (bootstrap probability) value for each cluster in a dendrogram.
372 Cluster Analysis & Finite Mixture Models randomLCA Random Effects Latent Class Analysis Fits standard and random effects latent class models. The single level random effects model is described in Qu et al <doi:10.2307/2533043> and the two level random effects model in Beath and Heller <doi:10.1177/1471082X0800900302>. Examples are given for their use in diagnostic testing.
373 Cluster Analysis & Finite Mixture Models rebmix Finite Mixture Modeling, Clustering & Classification R functions for random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, binomial, Poisson, Dirac or circular von Mises parametric families.
374 Cluster Analysis & Finite Mixture Models rjags Bayesian Graphical Models using MCMC Interface to the JAGS MCMC library.
375 Cluster Analysis & Finite Mixture Models Rmixmod (core) Classification with Mixture Modelling Interface of ‘MIXMOD’ software for supervised, unsupervised and semi-supervised classification with mixture modelling.
376 Cluster Analysis & Finite Mixture Models RPMM Recursively Partitioned Mixture Model Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.
377 Cluster Analysis & Finite Mixture Models seriation Infrastructure for Ordering Objects Using Seriation Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).
378 Cluster Analysis & Finite Mixture Models sigclust Statistical Significance of Clustering SigClust is a statistical method for testing the significance of clustering results. SigClust can be applied to assess the statistical significance of splitting a data set into two clusters. For more than two clusters, SigClust can be used iteratively.
379 Cluster Analysis & Finite Mixture Models skmeans Spherical k-Means Clustering Algorithms to compute spherical k-means partitions. Features several methods, including a genetic and a fixed-point algorithm and an interface to the CLUTO vcluster program.
380 Cluster Analysis & Finite Mixture Models som Self-Organizing Map Self-Organizing Map (with application in gene clustering).
381 Cluster Analysis & Finite Mixture Models tclust Robust Trimmed Clustering Provides functions for robust trimmed clustering. The methods are described in Garcia-Escudero (2008) <doi:10.1214/07-AOS515>, Fritz et al. (2012) <doi:10.18637/jss.v047.i12> and others.
382 Cluster Analysis & Finite Mixture Models teigen Model-Based Clustering and Classification with the Multivariate t Distribution Fits mixtures of multivariate t-distributions (with eigen-decomposed covariance structure) via the expectation conditional-maximization algorithm under a clustering or classification paradigm.
383 Cluster Analysis & Finite Mixture Models treeClust Cluster Distances Through Trees Create a measure of inter-point dissimilarity useful for clustering mixed data, and, optionally, perform the clustering.
384 Cluster Analysis & Finite Mixture Models trimcluster Cluster Analysis with Trimming Trimmed k-means clustering.
385 Cluster Analysis & Finite Mixture Models VarSelLCM Variable Selection for Model-Based Clustering of Mixed-Type Data Set with Missing Values Full model selection (detection of the relevant features and estimation of the number of clusters) for model-based clustering (see reference here <doi:10.1007/s11222-016-9670-1>). Data to analyze can be continuous, categorical, integer or mixed. Moreover, missing values can occur and do not necessitate any pre-processing. Shiny application permits an easy interpretation of the results.
386 Databases with R bigrquery An Interface to Google’s ‘BigQuery’ ‘API’ Easily talk to Google’s ‘BigQuery’ database from R.
387 Databases with R dbfaker A Tool to Ensure the Validity of Database Writes A tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support.
388 Databases with R DBI (core) R Database Interface A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.
389 Databases with R DBItest Testing ‘DBI’ Back Ends A helper that tests ‘DBI’ back ends for conformity to the interface.
390 Databases with R dbplyr A ‘dplyr’ Back End for Databases A ‘dplyr’ back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a ‘DBI’ back end; more advanced features require ‘SQL’ translation to be provided by the package author.
391 Databases with R dplyr A Grammar of Data Manipulation A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
392 Databases with R elastic General Purpose Interface to ‘Elasticsearch’ Connect to ‘Elasticsearch’, a ‘NoSQL’ database built on the ‘Java’ Virtual Machine. Interacts with the ‘Elasticsearch’ ‘HTTP’ API (<https://www.elastic.co/products/elasticsearch>), including functions for setting connection details to ‘Elasticsearch’ instances, loading bulk data, searching for documents with both ‘HTTP’ query variables and ‘JSON’ based body requests. In addition, ‘elastic’ provides functions for interacting with API’s for ‘indices’, documents, nodes, clusters, an interface to the cat API, and more.
393 Databases with R filehashSQLite Simple key-value database using SQLite Simple key-value database using SQLite as the backend
394 Databases with R implyr R Interface for Apache Impala ‘SQL’ back-end to ‘dplyr’ for Apache Impala, the massively parallel processing query engine for Apache ‘Hadoop’. Impala enables low-latency ‘SQL’ queries on data stored in the ‘Hadoop’ Distributed File System ‘(HDFS)’, Apache ‘HBase’, Apache ‘Kudu’, Amazon Simple Storage Service ‘(S3)’, Microsoft Azure Data Lake Store ‘(ADLS)’, and Dell ‘EMC’ ‘Isilon’. See <https://impala.apache.org> for more information about Impala.
395 Databases with R influxdbr R Interface to InfluxDB An R interface to the InfluxDB time series database <https://www.influxdata.com>. This package allows you to fetch and write time series data from/to an InfluxDB server. Additionally, handy wrappers for the Influx Query Language (IQL) to manage and explore a remote database are provided.
396 Databases with R liteq Lightweight Portable Message Queue Using ‘SQLite’ Temporary and permanent message queues for R. Built on top of ‘SQLite’ databases. ‘SQLite’ provides locking, and makes it possible to detect crashed consumers. Crashed jobs can be automatically marked as “failed”, or put in the queue again, potentially a limited number of times.
397 Databases with R mongolite Fast and Simple ‘MongoDB’ Client for R High-performance MongoDB client based on ‘mongo-c-driver’ and ‘jsonlite’. Includes support for aggregation, indexing, map-reduce, streaming, encryption, enterprise authentication, and GridFS. The online user manual provides an overview of the available methods in the package: <https://jeroen.github.io/mongolite/>.
398 Databases with R odbc (core) Connect to ODBC Compatible Databases (using the DBI Interface) A DBI-compatible interface to ODBC databases.
399 Databases with R ora Convenient Tools for Working with Oracle Databases Easy-to-use functions to explore Oracle databases and import data into R. User interface for the ROracle package.
400 Databases with R pivot ‘SQL’ PIVOT and UNPIVOT Extends the ‘tidyverse’ packages ‘dbplyr’ and ‘tidyr’ functionality with pivot(), i.e. spread(), and unpivot(), i.e. gather(), for reshaping remote tables. Currently only ‘Microsoft SQL Server’ is supported.
401 Databases with R pointblank Validation of Local and Remote Data Tables Validate data in data frames, ‘tibble’ objects, in ‘CSV’ and ‘TSV’ files, and in database tables (‘PostgreSQL’ and ‘MySQL’). Validation pipelines can be made using easily-readable, consecutive validation steps and such pipelines allow for switching of the data table context. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions.
402 Databases with R pool Object Pooling Enables the creation of object pools, which make it less computationally expensive to fetch a new object. Currently the only supported pooled objects are ‘DBI’ connections.
403 Databases with R R4CouchDB A R Convenience Layer for CouchDB 2.0 Provides a collection of functions for basic database and document management operations such as add, get, list access or delete. Every cdbFunction() gets and returns a list() containing the connection setup. Such a list can be generated by cdbIni().
404 Databases with R RCassandra R/Cassandra interface This packages provides a direct interface (without the use of Java) to the most basic functionality of Apache Cassanda such as login, updates and queries.
405 Databases with R RcppRedis ‘Rcpp’ Bindings for ‘Redis’ using the ‘hiredis’ Library Connection to the ‘Redis’ key/value store using the C-language client library ‘hiredis’ (included as a fallback) with ‘MsgPack’ encoding provided via ‘RcppMsgPack’ headers.
406 Databases with R redux R Bindings to ‘hiredis’ A ‘hiredis’ wrapper that includes support for transactions, pipelining, blocking subscription, serialisation of all keys and values, ‘Redis’ error handling with R errors. Includes an automatically generated ‘R6’ interface to the full ‘hiredis’ ‘API’. Generated functions are faithful to the ‘hiredis’ documentation while attempting to match R’s argument semantics. Serialisation must be explicitly done by the user, but both binary and text-mode serialisation is supported.
407 Databases with R RGreenplum Interface to ‘Greenplum’ Database Fully ‘DBI’-compliant interface to ‘Greenplum’ <https://greenplum.org/>, an open-source parallel database. This is an extension of the ‘RPostgres’ package <https://github.com/r-dbi/RPostgres>.
408 Databases with R RH2 DBI/RJDBC Interface to H2 Database DBI/RJDBC interface to h2 database. h2 version 1.3.175 is included.
409 Databases with R RJDBC Provides Access to Databases Through the JDBC Interface The RJDBC package is an implementation of R’s DBI interface using JDBC as a back-end. This allows R to connect to any DBMS that has a JDBC driver.
410 Databases with R RMariaDB Database Interface and ‘MariaDB’ Driver Implements a ‘DBI’-compliant interface to ‘MariaDB’ (<https://mariadb.org/>) and ‘MySQL’ (<https://www.mysql.com/>) databases.
411 Databases with R RMySQL Database Interface and ‘MySQL’ Driver for R Legacy ‘DBI’ interface to ‘MySQL’ / ‘MariaDB’ based on old code ported from S-PLUS. A modern ‘MySQL’ client based on ‘Rcpp’ is available from the ‘RMariaDB’ package.
412 Databases with R ROracle OCI Based Oracle Database Interface for R Oracle Database interface (DBI) driver for R. This is a DBI-compliant Oracle driver based on the OCI.
413 Databases with R rpostgis R Interface to a ‘PostGIS’ Database Provides an interface between R and ‘PostGIS’-enabled ‘PostgreSQL’ databases to transparently transfer spatial data. Both vector (points, lines, polygons) and raster data are supported in read and write modes. Also provides convenience functions to execute common procedures in ‘PostgreSQL/PostGIS’.
414 Databases with R RPostgres ‘Rcpp’ Interface to ‘PostgreSQL’ Fully ‘DBI’-compliant ‘Rcpp’-backed interface to ‘PostgreSQL’ <https://www.postgresql.org/>, an open-source relational database.
415 Databases with R RPostgreSQL R Interface to the ‘PostgreSQL’ Database System Database interface and ‘PostgreSQL’ driver for ‘R’. This package provides a Database Interface ‘DBI’ compliant driver for ‘R’ to access ‘PostgreSQL’ database systems. In order to build and install this package from source, ‘PostgreSQL’ itself must be present your system to provide ‘PostgreSQL’ functionality via its libraries and header files. These files are provided as ‘postgresql-devel’ package under some Linux distributions. On ‘macOS’ and ‘Microsoft Windows’ system the attached ‘libpq’ library source will be used.
416 Databases with R RPresto DBI Connector to Presto Implements a ‘DBI’ compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>.
417 Databases with R RSQLite ‘SQLite’ Interface for R Embeds the ‘SQLite’ database engine in R and provides an interface compliant with the ‘DBI’ package. The source for the ‘SQLite’ engine is included.
418 Databases with R sqldf Manipulate R Data Frames Using SQL The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. ‘RSQLite’, ‘RH2’, ‘RMySQL’ and ‘RPostgreSQL’ backends are supported.
419 Databases with R tidyr Easily Tidy Data with ‘spread()’ and ‘gather()’ Functions An evolution of ‘reshape2’. It’s designed specifically for data tidying (not general reshaping or aggregating) and works well with ‘dplyr’ data pipelines.
420 Databases with R TScompare ‘TSdbi’ Database Comparison Utilities for comparing the equality of series on two databases. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.
421 Databases with R uptasticsearch Get Data Frame Representations of ‘Elasticsearch’ Results ‘Elasticsearch’ is an open-source, distributed, document-based datastore (<https://www.elastic.co/products/elasticsearch>). It provides an ‘HTTP’ ‘API’ for querying the database and extracting datasets, but that ‘API’ was not designed for common data science workflows like pulling large batches of records and normalizing those documents into a data frame that can be used as a training dataset for statistical models. ‘uptasticsearch’ provides an interface for ‘Elasticsearch’ that is explicitly designed to make these data science workflows easy and fun.
422 Differential Equations adaptivetau Tau-Leaping Stochastic Simulation Implements adaptive tau leaping to approximate the trajectory of a continuous-time stochastic process as described by Cao et al. (2007) The Journal of Chemical Physics <doi:10.1063/1.2745299> (aka. the Gillespie stochastic simulation algorithm). This package is based upon work supported by NSF DBI-0906041 and NIH K99-GM104158 to Philip Johnson and NIH R01-AI049334 to Rustom Antia.
423 Differential Equations bvpSolve (core) Solvers for Boundary Value Problems of Differential Equations Functions that solve boundary value problems (‘BVP’) of systems of ordinary differential equations (‘ODE’) and differential algebraic equations (‘DAE’). The functions provide an interface to the FORTRAN functions ‘twpbvpC’, ‘colnew/colsys’, and an R-implementation of the shooting method.
424 Differential Equations cOde Automated C Code Generation for ‘deSolve’, ‘bvpSolve’ Generates all necessary C functions allowing the user to work with the compiled-code interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. Also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis.
425 Differential Equations CollocInfer Collocation Inference for Dynamic Systems These functions implement collocation-inference for continuous-time and discrete-time stochastic processes. They provide model-based smoothing, gradient-matching, generalized profiling and forwards prediction error methods.
426 Differential Equations deSolve (core) Solvers for Initial Value Problems of Differential Equations (‘ODE’, ‘DAE’, ‘DDE’) Functions that solve initial value problems of a system of first-order ordinary differential equations (‘ODE’), of partial differential equations (‘PDE’), of differential algebraic equations (‘DAE’), and of delay differential equations. The functions provide an interface to the FORTRAN functions ‘lsoda’, ‘lsodar’, ‘lsode’, ‘lsodes’ of the ‘ODEPACK’ collection, to the FORTRAN functions ‘dvode’, ‘zvode’ and ‘daspk’ and a C-implementation of solvers of the ‘Runge-Kutta’ family with fixed or variable time steps. The package contains routines designed for solving ‘ODEs’ resulting from 1-D, 2-D and 3-D partial differential equations (‘PDE’) that have been converted to ‘ODEs’ by numerical differencing.
427 Differential Equations deTestSet Test Set for Differential Equations Solvers and test set for stiff and non-stiff differential equations, and differential algebraic equations.
428 Differential Equations diffeqr Solving Differential Equations (ODEs, SDEs, DDEs, DAEs) An interface to ‘DifferentialEquations.jl’ <http://docs.juliadiffeq.org/latest/> from the R programming language. It has unique high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE), delay differential equations (DDE), differential-algebraic equations (DAE), and more. Much of the functionality, including features like adaptive time stepping in SDEs, are unique and allow for multiple orders of magnitude speedup over more common methods. ‘diffeqr’ attaches an R interface onto the package, allowing seamless use of this tooling by R users.
429 Differential Equations dMod Dynamic Modeling and Parameter Estimation in ODE Models The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.
430 Differential Equations ecolMod “A practical guide to ecological modelling - using R as a simulation platform” Figures, data sets and examples from the book “A practical guide to ecological modelling - using R as a simulation platform” by Karline Soetaert and Peter MJ Herman (2009). Springer. All figures from chapter x can be generated by “demo(chapx)”, where x = 1 to 11. The R-scripts of the model examples discussed in the book are in subdirectory “examples”, ordered per chapter. Solutions to model projects are in the same subdirectories.
431 Differential Equations FME A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package ‘deSolve’, or a steady-state solver from package ‘rootSolve’. However, the methods can also be used with other types of functions.
432 Differential Equations GillespieSSA Gillespie’s Stochastic Simulation Algorithm (SSA) GillespieSSA provides a simple to use, intuitive, and extensible interface to several stochastic simulation algorithms for generating simulated trajectories of finite population continuous-time model. Currently it implements Gillespie’s exact stochastic simulation algorithm (Direct method) and several approximate methods (Explicit tau-leap, Binomial tau-leap, and Optimized tau-leap). The package also contains a library of template models that can be run as demo models and can easily be customized and extended. Currently the following models are included, decaying-dimerization reaction set, linear chain system, logistic growth model, Lotka predator-prey model, Rosenzweig-MacArthur predator-prey model, Kermack-McKendrick SIR model, and a metapopulation SIRS model.
433 Differential Equations mkin Kinetic Evaluation of Chemical Degradation Data Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers and a choice of the optimisation methods made available by the ‘FME’ package. If a C compiler (on windows: ‘Rtools’) is installed, differential equation models are solved using compiled C functions. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.
434 Differential Equations nlmeODE Non-linear mixed-effects modelling in nlme using differential equations This package combines the odesolve and nlme packages for mixed-effects modelling using differential equations.
435 Differential Equations odeintr C++ ODE Solvers Compiled on-Demand Wraps the Boost odeint library for integration of differential equations.
436 Differential Equations PBSddesolve Solver for Delay Differential Equations Routines for solving systems of delay differential equations by interfacing numerical routines written by Simon N. Wood , with contributions by Benjamin J. Cairns. These numerical routines first appeared in Simon Wood’s ‘solv95’ program. This package includes a vignette and a complete user’s guide. ‘PBSddesolve’ originally appeared on CRAN under the name ‘ddesolve’. That version is no longer supported. The current name emphasizes a close association with other PBS packages, particularly ‘PBSmodelling’.
437 Differential Equations PBSmodelling GUI Tools Made Easy: Interact with Models and Explore Data Provides software to facilitate the design, testing, and operation of computer models. It focuses particularly on tools that make it easy to construct and edit a customized graphical user interface (‘GUI’). Although our simplified ‘GUI’ language depends heavily on the R interface to the ‘Tcl/Tk’ package, a user does not need to know ‘Tcl/Tk’. Examples illustrate models built with other R packages, including ‘PBSmapping’, ‘PBSddesolve’, and ‘BRugs’. A complete user’s guide ‘PBSmodelling-UG.pdf’ shows how to use this package effectively.
438 Differential Equations phaseR Phase Plane Analysis of One and Two Dimensional Autonomous ODE Systems Performs a qualitative analysis of one and two dimensional autonomous ODE systems, using phase plane methods. Programs are available to identify and classify equilibrium points, plot the direction field, and plot trajectories for multiple initial conditions. In the one dimensional case, a program is also available to plot the phase portrait. Whilst in the two dimensional case, programs are additionally available to plot nullclines and stable/unstable manifolds of saddle points. Many example systems are provided for the user.
439 Differential Equations pomp Statistical Inference for Partially Observed Markov Processes Tools for working with partially observed Markov process (POMP) models (also known as stochastic dynamical systems, hidden Markov models, and nonlinear, non-Gaussian, state-space models). The package provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a versatile platform for implementation of inference methods for general POMP models.
440 Differential Equations pracma Practical Numerical Math Functions Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some well-known special mathematical functions. Uses ‘MATLAB’ function names where appropriate to simplify porting.
441 Differential Equations primer Functions and data for A Primer of Ecology with R Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.
442 Differential Equations QPot Quasi-Potential Analysis for Stochastic Differential Equations Tools to 1) simulate and visualize stochastic differential equations and 2) determine stability of equilibria using the ordered-upwind method to compute the quasi-potential.
443 Differential Equations ReacTran Reactive Transport Modelling in 1d, 2d and 3d Routines for developing models that describe reaction and advective-diffusive transport in one, two or three dimensions. Includes transport routines in porous media, in estuaries, and in bodies with variable shape.
444 Differential Equations rODE Ordinary Differential Equation (ODE) Solvers Written in R Using S4 Classes Show physics, math and engineering students how an ODE solver is made and how effective R classes can be for the construction of the equations that describe natural phenomena. Inspiration for this work comes from the book on “Computer Simulations in Physics” by Harvey Gould, Jan Tobochnik, and Wolfgang Christian. Book link: <http://www.compadre.org/osp/items/detail.cfm?ID=7375>.
445 Differential Equations rodeo A Code Generator for ODE-Based Models Provides an R6 class and several utility methods to facilitate the implementation of models based on ordinary differential equations. The heart of the package is a code generator that creates compiled ‘Fortran’ (or ‘R’) code which can be passed to a numerical solver. There is direct support for solvers contained in packages ‘deSolve’ and ‘rootSolve’.
446 Differential Equations rootSolve (core) Nonlinear Root Finding, Equilibrium and Steady-State Analysis of Ordinary Differential Equations Routines to find the root of nonlinear functions, and to perform steady-state and equilibrium analysis of ordinary differential equations (ODE). Includes routines that: (1) generate gradient and jacobian matrices (full and banded), (2) find roots of non-linear equations by the ‘Newton-Raphson’ method, (3) estimate steady-state conditions of a system of (differential) equations in full, banded or sparse form, using the ‘Newton-Raphson’ method, or by dynamically running, (4) solve the steady-state conditions for uni-and multicomponent 1-D, 2-D, and 3-D partial differential equations, that have been converted to ordinary differential equations by numerical differencing (using the method-of-lines approach). Includes fortran code.
447 Differential Equations scaRabee Optimization Toolkit for Pharmacokinetic-Pharmacodynamic Models scaRabee is a port of the Scarabee toolkit originally written as a Matlab-based application. It provides a framework for simulation and optimization of pharmacokinetic-pharmacodynamic models at the individual and population level. It is built on top of the neldermead package, which provides the direct search algorithm proposed by Nelder and Mead for model optimization.
448 Differential Equations sde (core) Simulation and Inference for Stochastic Differential Equations Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 978-0-387-75838-1, Springer, NY.
449 Differential Equations Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
450 Differential Equations simecol Simulation of Ecological (and Other) Dynamic Systems An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individual-based (or agent-based) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and re-usability of code.
451 Probability Distributions actuar (core) Actuarial Functions and Heavy Tailed Distributions Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poisson-inverse Gaussian discrete distribution; zero-truncated and zero-modified extensions of the standard discrete distributions. Support for phase-type distributions commonly used to compute ruin probabilities.
452 Probability Distributions AdMit Adaptive Mixture of Student-t Distributions Provides functions to perform the fitting of an adaptive mixture of Student-t distributions to a target density through its kernel function as described in Ardia et al. (2009) <doi:10.18637/jss.v029.i03>. The mixture approximation can then be used as the importance density in importance sampling or as the candidate density in the Metropolis-Hastings algorithm to obtain quantities of interest for the target density itself.
453 Probability Distributions agricolae Statistical Procedures for Agricultural Research Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
454 Probability Distributions ald The Asymmetric Laplace Distribution It provides the density, distribution function, quantile function, random number generator, likelihood function, moments and Maximum Likelihood estimators for a given sample, all this for the three parameter Asymmetric Laplace Distribution defined in Koenker and Machado (1999). This is a special case of the skewed family of distributions available in Galarza et.al. (2017) <doi:10.1002/sta4.140> useful for quantile regression.
455 Probability Distributions AtelieR A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (Central-Limit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).
456 Probability Distributions bayesm Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
457 Probability Distributions benchden 28 benchmark densities from Berlinet/Devroye (1994) Full implementation of the 28 distributions introduced as benchmarks for nonparametric density estimation by Berlinet and Devroye (1994). Includes densities, cdfs, quantile functions and generators for samples as well as additional information on features of the densities. Also contains the 4 histogram densities used in Rozenholc/Mildenberger/Gather (2010).
458 Probability Distributions BiasedUrn Biased Urn Model Distributions Statistical models of biased sampling in the form of univariate and multivariate noncentral hypergeometric distributions, including Wallenius’ noncentral hypergeometric distribution and Fisher’s noncentral hypergeometric distribution (also called extended hypergeometric distribution). See vignette(“UrnTheory”) for explanation of these distributions.
459 Probability Distributions bivariate Bivariate Probability Distributions Contains convenience functions for constructing and plotting bivariate probability distributions (probability mass functions, probability density functions and cumulative distribution functions). Currently, supports uniform (discrete and continuous), binomial, Poisson, normal and bimodal distributions, and kernel smoothing.
460 Probability Distributions Bivariate.Pareto Bivariate Pareto Models Perform competing risks analysis under bivariate Pareto models. See Shih et al. (2018) <doi:10.1080/03610926.2018.1425450> for details.
461 Probability Distributions BivarP Estimating the Parameters of Some Bivariate Distributions Parameter estimation of bivariate distribution functions modeled as a Archimedean copula function. The input data may contain values from right censored. Used marginal distributions are two-parameter. Methods for density, distribution, survival, random sample generation.
462 Probability Distributions bivgeom Roy’s Bivariate Geometric Distribution Implements Roy’s bivariate geometric model (Roy (1993) <doi:10.1006/jmva.1993.1065>): joint probability mass function, distribution function, survival function, random generation, parameter estimation, and more.
463 Probability Distributions bmixture Bayesian Estimation for Finite Mixture of Distributions Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions. The package is implemented the recent improvements in Bayesian literature for the finite mixture of distributions, including Mohammadi and et al. (2013) <doi:10.1007/s00180-012-0323-3> and Mohammadi and Salehi-Rad (2012) <doi:10.1080/03610918.2011.588358>.
464 Probability Distributions BMT The BMT Distribution Density, distribution, quantile function, random number generation for the BMT (Bezier-Montenegro-Torres) distribution. Torres-Jimenez C.J. and Montenegro-Diaz A.M. (2017) <arXiv:1709.05534>. Moments, descriptive measures and parameter conversion for different parameterizations of the BMT distribution. Fit of the BMT distribution to non-censored data by maximum likelihood, moment matching, quantile matching, maximum goodness-of-fit, also known as minimum distance, maximum product of spacing, also called maximum spacing, and minimum quantile distance, which can also be called maximum quantile goodness-of-fit. Fit of univariate distributions for non-censored data using maximum product of spacing estimation and minimum quantile distance estimation is also included.
465 Probability Distributions bridgedist An Implementation of the Bridge Distribution with Logit-Link as in Wang and Louis (2003) An implementation of the bridge distribution with logit-link in R. In Wang and Louis (2003) <doi:10.1093/biomet/90.4.765>, such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian.
466 Probability Distributions cbinom Continuous Analog of a Binomial Distribution Implementation of the d/p/q/r family of functions for a continuous analog to the standard discrete binomial with continuous size parameter and continuous support with x in [0, size + 1], following Ilienko (2013) <arXiv:1303.5990>.
467 Probability Distributions CDVine Statistical Inference of C- And D-Vine Copulas Functions for statistical inference of canonical vine (C-vine) and D-vine copulas. Tools for bivariate exploratory data analysis and for bivariate as well as vine copula selection are provided. Models can be estimated either sequentially or by joint maximum likelihood estimation. Sampling algorithms and plotting methods are also included. Data is assumed to lie in the unit hypercube (so-called copula data).
468 Probability Distributions cmvnorm The Complex Multivariate Gaussian Distribution Various utilities for the complex multivariate Gaussian distribution.
469 Probability Distributions coga Convolution of Gamma Distributions Evaluation for density and distribution function of convolution of gamma distributions in R. Two related exact methods and one approximate method are implemented with efficient algorithm and C++ code. A quick guide for choosing correct method and usage of this package is given in package vignette.
470 Probability Distributions CompGLM Conway-Maxwell-Poisson GLM and Distribution Functions A function (which uses a similar interface to the ‘glm’ function) for the fitting of a Conway-Maxwell-Poisson GLM. There are also various methods for analysis of the model fit. The package also contains functions for the Conway-Maxwell-Poisson distribution in a similar interface to functions ‘dpois’, ‘ppois’ and ‘rpois’. The functions are generally quick, since the workhorse functions are written in C++ (thanks to the Rcpp package).
471 Probability Distributions CompLognormal Functions for actuarial scientists Computes the probability density function, cumulative distribution function, quantile function, random numbers of any composite model based on the lognormal distribution.
472 Probability Distributions compoisson Conway-Maxwell-Poisson Distribution Provides routines for density and moments of the Conway-Maxwell-Poisson distribution as well as functions for fitting the COM-Poisson model for over/under-dispersed count data.
473 Probability Distributions Compositional Compositional Data Analysis Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. The standard textbook for such data is John Aitchison’s (1986) “The statistical analysis of compositional data”. Relevant papers include a) Tsagris M.T., Preston S. and Wood A.T.A. (2011) A data-based power transformation for compositional data. Fourth International International Workshop on Compositional Data Analysis. b) Tsagris M. (2014). The k-NN algorithm for compositional data: a revised approach with and without zero values present. Journal of Data Science, 12(3):519534. c) Tsagris M. (2015). Regression analysis with compositional data containing zero values. Chilean Journal of Statistics, 6(2):4757. d) Tsagris M., Preston S. and Wood A.T.A. (2016). Improved supervised classification for compositional data using the alpha-transformation. Journal of Classification, 33(2):243261. e) Tsagris M., Preston S. and Wood A.T.A. (2017). Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation, 87(2): 406422. f) Tsagris M. and Stewart C. (2018). A Dirichlet regression model for compositional data with zeros. Lobachevskii Journal of Mathematics, 39(3): 398412. Furher, we include functions for percentages (or proportions).
474 Probability Distributions compositions Compositional Data Analysis Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.
475 Probability Distributions Compounding Computing Continuous Distributions Computing Continuous Distributions Obtained by Compounding a Continuous and a Discrete Distribution
476 Probability Distributions CompQuadForm Distribution Function of Quadratic Forms in Normal Variables Computes the distribution function of quadratic forms in normal variables using Imhof’s method, Davies’s algorithm, Farebrother’s algorithm or Liu et al.’s algorithm.
477 Probability Distributions condMVNorm Conditional Multivariate Normal Distribution Computes conditional multivariate normal probabilities, random deviates and densities.
478 Probability Distributions copBasic General Bivariate Copula Theory and Many Utility Functions Extensive functions for bivariate copula (bicopula) computations and related operations for bicopula theory. The lower, upper, product, and select other bicopula are implemented along with operations including the diagonal, survival copula, dual of a copula, co-copula, and numerical bicopula density. Level sets, horizontal and vertical sections are supported. Numerical derivatives and inverses of a bicopula are provided through which simulation is implemented. Bicopula composition, convex combination, and products also are provided. Support extends to the Kendall Function as well as the Lmoments thereof. Kendall Tau, Spearman Rho and Footrule, Gini Gamma, Blomqvist Beta, Hoeffding Phi, Schweizer- Wolff Sigma, tail dependency, tail order, skewness, and bivariate Lmoments are implemented, and positive/negative quadrant dependency, left (right) increasing (decreasing) are available. Other features include Kullback-Leibler divergence, Vuong procedure, spectral measure, and Lcomoments for inference, maximum likelihood, and AIC, BIC, and RMSE for goodness-of-fit.
479 Probability Distributions copula (core) Multivariate Dependence with Copulas Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
480 Probability Distributions csn Closed Skew-Normal Distribution Provides functions for computing the density and the log-likelihood function of closed-skew normal variates, and for generating random vectors sampled from this distribution. See Gonzalez-Farias, G., Dominguez-Molina, J., and Gupta, A. (2004). The closed skew normal distribution, Skew-elliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 25-42.
481 Probability Distributions Davies The Davies Quantile Function Various utilities for the Davies distribution.
482 Probability Distributions degreenet Models for Skewed Count Distributions Relevant to Networks Likelihood-based inference for skewed count distributions used in network modeling. “degreenet” is a part of the “statnet” suite of packages for network analysis.
483 Probability Distributions Delaporte Statistical Functions for the Delaporte Distribution Provides probability mass, distribution, quantile, random-variate generation, and method-of-moments parameter-estimation functions for the Delaporte distribution. The Delaporte is a discrete probability distribution which can be considered the convolution of a negative binomial distribution with a Poisson distribution. Alternatively, it can be considered a counting distribution with both Poisson and negative binomial components. It has been studied in actuarial science as a frequency distribution which has more variability than the Poisson, but less than the negative binomial.
484 Probability Distributions dirmult Estimation in Dirichlet-Multinomial distribution Estimate parameters in Dirichlet-Multinomial and compute profile log-likelihoods.
485 Probability Distributions disclap Discrete Laplace Exponential Family Discrete Laplace exponential family for models such as a generalized linear model
486 Probability Distributions DiscreteInverseWeibull Discrete Inverse Weibull Distribution Probability mass function, distribution function, quantile function, random generation and parameter estimation for the discrete inverse Weibull distribution.
487 Probability Distributions DiscreteLaplace Discrete Laplace Distributions Probability mass function, distribution function, quantile function, random generation and estimation for the skew discrete Laplace distributions.
488 Probability Distributions DiscreteWeibull Discrete Weibull Distributions (Type 1 and 3) Probability mass function, distribution function, quantile function, random generation and parameter estimation for the type I and III discrete Weibull distributions.
489 Probability Distributions distcrete Discrete Distribution Approximations Creates discretised versions of continuous distribution functions by mapping continuous values to an underlying discrete grid, based on a (uniform) frequency of discretisation, a valid discretisation point, and an integration range. For a review of discretisation methods, see Chakraborty (2015) <doi:10.1186/s40488-015-0028-6>.
490 Probability Distributions distr (core) Object Oriented Implementation of Distributions S4-classes and methods for distributions.
491 Probability Distributions distrDoc Documentation for ‘distr’ Family of R Packages Provides documentation in form of a common vignette to packages ‘distr’, ‘distrEx’, ‘distrMod’, ‘distrSim’, ‘distrTEst’, ‘distrTeach’, and ‘distrEllipse’.
492 Probability Distributions distrEllipse S4 Classes for Elliptically Contoured Distributions Distribution (S4-)classes for elliptically contoured distributions (based on package ‘distr’).
493 Probability Distributions distrEx Extensions of Package ‘distr’ Extends package ‘distr’ by functionals, distances, and conditional distributions.
494 Probability Distributions DistributionUtils Distribution Utilities Utilities are provided which are of use in the packages I have developed for dealing with distributions. Currently these packages are GeneralizedHyperbolic, VarianceGamma, and SkewHyperbolic and NormalLaplace. Each of these packages requires DistributionUtils. Functionality includes sample skewness and kurtosis, log-histogram, tail plots, moments by integration, changing the point about which a moment is calculated, functions for testing distributions using inversion tests and the Massart inequality. Also includes an implementation of the incomplete Bessel K function.
495 Probability Distributions distrMod Object Oriented Implementation of Probability Models Implements S4 classes for probability models based on packages ‘distr’ and ‘distrEx’.
496 Probability Distributions distrSim Simulation Classes Based on Package ‘distr’ S4-classes for setting up a coherent framework for simulation within the distr family of packages.
497 Probability Distributions distrTeach Extensions of Package ‘distr’ for Teaching Stochastics/Statistics in Secondary School Provides flexible examples of LLN and CLT for teaching purposes in secondary school.
498 Probability Distributions distrTEst Estimation and Testing Classes Based on Package ‘distr’ Evaluation (S4-)classes based on package distr for evaluating procedures (estimators/tests) at data/simulation in a unified way.
499 Probability Distributions dng Distributions and Gradients Provides density, distribution function, quantile function and random generation for the split normal and split-t distributions, and computes their mean, variance, skewness and kurtosis for the two distributions (Li, F, Villani, M. and Kohn, R. (2010) <doi:10.1016/j.jspi.2010.04.031>).
500 Probability Distributions e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
501 Probability Distributions ecd Elliptic Lambda Distribution and Option Pricing Model Elliptic lambda distribution and lambda option pricing model have been evolved into a framework of stable-law inspired distributions, such as the extended stable lambda distribution for asset return, stable count distribution for volatility, and Lihn-Laplace process as a leptokurtic extension of Wiener process. This package contains functions for the computation of density, probability, quantile, random variable, fitting procedures, option prices, volatility smile. It also comes with sample financial data, and plotting routines.
502 Probability Distributions emdbook Support Functions and Data for “Ecological Models and Data” Auxiliary functions and data sets for “Ecological Models and Data”, a book presenting maximum likelihood estimation and related topics for ecologists (ISBN 978-0-691-12522-0).
503 Probability Distributions emg Exponentially Modified Gaussian (EMG) Distribution Provides basic distribution functions for a mixture model of a Gaussian and exponential distribution.
504 Probability Distributions EnvStats Package for Environmental Statistics, Including US EPA Guidance Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <http://www.springer.com/book/9781461484554>).
505 Probability Distributions evd Functions for Extreme Value Distributions Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.
506 Probability Distributions evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
507 Probability Distributions evir Extreme Values in R Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.
508 Probability Distributions evmix Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation The usual distribution functions, maximum likelihood inference and model diagnostics for univariate stationary extreme value mixture models are provided. Kernel density estimation including various boundary corrected kernel density estimation methods and a wide choice of kernels, with cross-validation likelihood based bandwidth estimator. Reasonable consistency with the base functions in the ‘evd’ package is provided, so that users can safely interchange most code.
509 Probability Distributions extraDistr Additional Univariate and Multivariate Distributions Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson.
510 Probability Distributions extremefit Estimation of Extreme Conditional Quantiles and Probabilities Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.
511 Probability Distributions FAdist Distributions that are Sometimes Used in Hydrology Probability distributions that are sometimes useful in hydrology.
512 Probability Distributions FatTailsR Kiener Distributions and Fat Tails in Finance Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, value-at-risk and expected shortfall. Include power hyperbolas and power hyperbolic functions.
513 Probability Distributions fBasics Rmetrics - Markets and Basic Statistics Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.
514 Probability Distributions fCopulae (core) Rmetrics - Bivariate Dependence Structures with Copulae Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.
515 Probability Distributions fExtremes Rmetrics - Modelling Extreme Events in Finance Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data pre-processing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.
516 Probability Distributions fgac Generalized Archimedean Copula Bi-variate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).
517 Probability Distributions fitdistrplus Help to Fit of a Parametric Distribution to Non-Censored or Censored Data Extends the fitdistr() function (of the MASS package) with several functions to help the fit of a parametric distribution to non-censored or censored data. Censored data may contain left censored, right censored and interval censored values, with several lower and upper bounds. In addition to maximum likelihood estimation (MLE), the package provides moment matching (MME), quantile matching (QME) and maximum goodness-of-fit estimation (MGE) methods (available only for non-censored data). Weighted versions of MLE, MME and QME are available. See e.g. Casella & Berger (2002). Statistical inference. Pacific Grove.
518 Probability Distributions fitteR Fit Hundreds of Theoretical Distributions to Empirical Data Systematic fit of hundreds of theoretical univariate distributions to empirical data via maximum likelihood estimation. Fits are reported and summarized by a data.frame, a csv file or a ‘shiny’ app (here with additional features like visual representation of fits). All output formats provide assessment of goodness-of-fit by the following methods: Kolmogorov-Smirnov test, Shapiro-Wilks test, Anderson-Darling test.
519 Probability Distributions flexsurv Flexible Parametric Survival and Multi-State Models Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models.
520 Probability Distributions FMStable Finite Moment Stable Distributions This package implements some basic procedures for dealing with log maximally skew stable distributions, which are also called finite moment log stable distributions.
521 Probability Distributions fpow Computing the noncentrality parameter of the noncentral F distribution Returns the noncentrality parameter of the noncentral F distribution if probability of type I and type II error, degrees of freedom of the numerator and the denominator are given. It may be useful for computing minimal detectable differences for general ANOVA models. This program is documented in the paper of A. Baharev, S. Kemeny, On the computation of the noncentral F and noncentral beta distribution; Statistics and Computing, 2008, 18 (3), 333-340.
522 Probability Distributions frmqa The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.
523 Probability Distributions fromo Fast Robust Moments Fast, numerically robust computation of weighted moments via ‘Rcpp’. Supports computation on vectors and matrices, and Monoidal append of moments. Moments and cumulants over running fixed length windows can be computed, as well as over time-based windows. Moment computations are via a generalization of Welford’s method, as described by Bennett et. (2009) <doi:10.1109/CLUSTR.2009.5289161>.
524 Probability Distributions gambin Fit the Gambin Model to Species Abundance Distributions Fits unimodal and multimodal gambin distributions to species-abundance distributions from ecological data, as in in Matthews et al. (2014) <doi:10.1111/ecog.00861>. ‘gambin’ is short for ‘gamma-binomial’. The main function is fit_abundances(), which estimates the ‘alpha’ parameter(s) of the gambin distribution using maximum likelihood. Functions are also provided to generate the gambin distribution and for calculating likelihood statistics.
525 Probability Distributions gamlss.dist (core) Distributions for Generalized Additive Models for Location Scale and Shape A set of distributions which can be used for modelling the response variables in Generalized Additive Models for Location Scale and Shape, Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The distributions can be continuous, discrete or mixed distributions. Extra distributions can be created, by transforming, any continuous distribution defined on the real line, to a distribution defined on ranges 0 to infinity or 0 to 1, by using a ”log” or a ”logit’ transformation respectively.
526 Probability Distributions gamlss.mx Fitting Mixture Distributions with GAMLSS The main purpose of this package is to allow fitting of mixture distributions with GAMLSS models.
527 Probability Distributions gaussDiff Difference measures for multivariate Gaussian probability density functions A collection difference measures for multivariate Gaussian probability density functions, such as the Euclidea mean, the Mahalanobis distance, the Kullback-Leibler divergence, the J-Coefficient, the Minkowski L2-distance, the Chi-square divergence and the Hellinger Coefficient.
528 Probability Distributions gb Generalize Lambda Distribution and Generalized Bootstrapping A collection of algorithms and functions for fitting data to a generalized lambda distribution via moment matching methods, and generalized bootstrapping.
529 Probability Distributions GB2 Generalized Beta Distribution of the Second Kind: Properties, Likelihood, Estimation Package GB2 explores the Generalized Beta distribution of the second kind. Density, cumulative distribution function, quantiles and moments of the distributions are given. Functions for the full log-likelihood, the profile log-likelihood and the scores are provided. Formulas for various indicators of inequality and poverty under the GB2 are implemented. The GB2 is fitted by the methods of maximum pseudo-likelihood estimation using the full and profile log-likelihood, and non-linear least squares estimation of the model parameters. Various plots for the visualization and analysis of the results are provided. Variance estimation of the parameters is provided for the method of maximum pseudo-likelihood estimation. A mixture distribution based on the compounding property of the GB2 is presented (denoted as “compound” in the documentation). This mixture distribution is based on the discretization of the distribution of the underlying random scale parameter. The discretization can be left or right tail. Density, cumulative distribution function, moments and quantiles for the mixture distribution are provided. The compound mixture distribution is fitted using the method of maximum pseudo-likelihood estimation. The fit can also incorporate the use of auxiliary information. In this new version of the package, the mixture case is complemented with new functions for variance estimation by linearization and comparative density plots.
530 Probability Distributions GenBinomApps Clopper-Pearson Confidence Interval and Generalized Binomial Distribution Density, distribution function, quantile function and random generation for the Generalized Binomial Distribution. Functions to compute the Clopper-Pearson Confidence Interval and the required sample size. Enhanced model for burn-in studies, where failures are tackled by countermeasures.
531 Probability Distributions gendist Generated Probability Distribution Models Computes the probability density function (pdf), cumulative distribution function (cdf), quantile function (qf) and generates random values (rg) for the following general models : mixture models, composite models, folded models, skewed symmetric models and arc tan models.
532 Probability Distributions GeneralizedHyperbolic The Generalized Hyperbolic Distribution Functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skew-Laplace distribution. Additional functionality is provided for the hyperbolic distribution, normal inverse Gaussian distribution and generalized inverse Gaussian distribution, including fitting of these distributions to data. Linear models with hyperbolic errors may be fitted using hyperblmFit.
533 Probability Distributions GenOrd Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions A gaussian copula based procedure for generating samples from discrete random variables with prescribed correlation matrix and marginal distributions.
534 Probability Distributions geoR Analysis of Geostatistical Data Geostatistical analysis including traditional, likelihood-based and Bayesian methods.
535 Probability Distributions ghyp A Package on Generalized Hyperbolic Distribution and Its Special Cases Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Student-t and Gaussian distribution). Especially, it contains fitting procedures, an AIC-based model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.
536 Probability Distributions GIGrvg Random Variate Generator for the GIG Distribution Generator and density function for the Generalized Inverse Gaussian (GIG) distribution.
537 Probability Distributions gk g-and-k and g-and-h Distribution Functions Functions for the g-and-k and generalised g-and-h distributions.
538 Probability Distributions gld Estimation and Use of the Generalised (Tukey) Lambda Distribution The generalised lambda distribution, or Tukey lambda distribution, provides a wide variety of shapes with one functional form. This package provides random numbers, quantiles, probabilities, densities and density quantiles for four different parameterisations of the distribution. It provides the density function, distribution function, and Quantile-Quantile plots. It implements a variety of estimation methods for the distribution, including diagnostic plots. Estimation methods include the starship (all 4 parameterisations) and a number of methods for only the FKML parameterisation. These include maximum likelihood, maximum product of spacings, Titterington’s method, Moments, L-Moments, Trimmed L-Moments and Distributional Least Absolutes.
539 Probability Distributions GLDEX Fitting Single and Mixture of Generalised Lambda Distributions (RS and FMKL) using Various Methods The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution.
540 Probability Distributions glogis Fitting and Testing Generalized Logistic Distributions Tools for the generalized logistic distribution (Type I, also known as skew-logistic distribution), encompassing basic distribution functions (p, q, d, r, score), maximum likelihood estimation, and structural change methods.
541 Probability Distributions greybox Toolbox for Model Building and Forecasting Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
542 Probability Distributions GSM Gamma Shape Mixture Implementation of a Bayesian approach for estimating a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavy-tailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter.
543 Probability Distributions gumbel The Gumbel-Hougaard Copula Provides probability functions (cumulative distribution and density functions), simulation function (Gumbel copula multivariate simulation) and estimation functions (Maximum Likelihood Estimation, Inference For Margins, Moment Based Estimation and Canonical Maximum Likelihood).
544 Probability Distributions HAC Estimation, Simulation and Visualization of Hierarchical Archimedean Copulae (HAC) Package provides the estimation of the structure and the parameters, sampling methods and structural plots of Hierarchical Archimedean Copulae (HAC).
545 Probability Distributions hermite Generalized Hermite Distribution Probability functions and other utilities for the generalized Hermite distribution.
546 Probability Distributions HI Simulation from distributions supported by nested hyperplanes Simulation from distributions supported by nested hyperplanes, using the algorithm described in Petris & Tardella, “A geometric approach to transdimensional Markov chain Monte Carlo”, Canadian Journal of Statistics, v.31, n.4, (2003). Also random direction multivariate Adaptive Rejection Metropolis Sampling.
547 Probability Distributions HistogramTools Utility Functions for R Histograms Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.
548 Probability Distributions hyper2 The Hyperdirichlet Distribution, Mark 2 A suite of routines for the hyperdirichlet distribution; supersedes the ‘hyperdirichlet’ package for most purposes.
549 Probability Distributions HyperbolicDist The hyperbolic distribution This package provides functions for the hyperbolic and related distributions. Density, distribution and quantile functions and random number generation are provided for the hyperbolic distribution, the generalized hyperbolic distribution, the generalized inverse Gaussian distribution and the skew-Laplace distribution. Additional functionality is provided for the hyperbolic distribution, including fitting of the hyperbolic to data.
550 Probability Distributions ihs Inverse Hyperbolic Sine Distribution Density, distribution function, quantile function and random generation for the inverse hyperbolic sine distribution. This package also provides a function that can fit data to the inverse hyperbolic sine distribution using maximum likelihood estimation.
551 Probability Distributions kdist K-Distribution and Weibull Paper Density, distribution function, quantile function and random generation for the K-distribution. A plotting function that plots data on Weibull paper and another function to draw additional lines. See results from package in T Lamont-Smith (2018), submitted J. R. Stat. Soc.
552 Probability Distributions kernelboot Smoothed Bootstrap and Random Generation from Kernel Densities Smoothed bootstrap and functions for random generation from univariate and multivariate kernel densities. It does not estimate kernel densities.
553 Probability Distributions kolmim An Improved Evaluation of Kolmogorov’s Distribution Provides an alternative, more efficient evaluation of extreme probabilities of Kolmogorov’s goodness-of-fit measure, Dn, when compared to the original implementation of Wang, Marsaglia, and Tsang. These probabilities are used in Kolmogorov-Smirnov tests when comparing two samples.
554 Probability Distributions KScorrect Lilliefors-Corrected Kolmogorov-Smirnov Goodness-of-Fit Tests Implements the Lilliefors-corrected Kolmogorov-Smirnov test for use in goodness-of-fit tests, suitable when population parameters are unknown and must be estimated by sample statistics. P-values are estimated by simulation. Can be used with a variety of continuous distributions, including normal, lognormal, univariate mixtures of normals, uniform, loguniform, exponential, gamma, and Weibull distributions. Functions to generate random numbers and calculate density, distribution, and quantile functions are provided for use with the log uniform and mixture distributions.
555 Probability Distributions LambertW Probabilistic Models to Analyze and Gaussianize Heavy-Tailed, Skewed Data Lambert W x F distributions are a generalized framework to analyze skewed, heavy-tailed data. It is based on an input/output system, where the output random variable (RV) Y is a non-linearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavy-tailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavy-tailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. Probably the most important function is ‘Gaussianize’, which works similarly to ‘scale’, but actually makes the data Gaussian. A do-it-yourself toolkit allows users to define their own Lambert W x ‘MyFavoriteDistribution’ and use it in their analysis right away.
556 Probability Distributions LaplacesDemon Complete Environment for Bayesian Inference Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview). The README describes the history of the package development process.
557 Probability Distributions LearnBayes Functions for Learning Bayesian Inference A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.
558 Probability Distributions lhs Latin Hypercube Samples Provides a number of methods for creating and augmenting Latin Hypercube Samples.
559 Probability Distributions LIHNPSD Poisson Subordinated Distribution A Poisson Subordinated Distribution to capture major leptokurtic features in log-return time series of financial data.
560 Probability Distributions lmom L-Moments Functions related to L-moments: computation of L-moments and trimmed L-moments of distributions and data samples; parameter estimation; L-moment ratio diagram; plot vs. quantiles of an extreme-value distribution.
561 Probability Distributions lmomco (core) L-Moments, Censored L-Moments, Trimmed L-Moments, L-Comoments, and Many Distributions Extensive functions for L-moments (LMs) and probability-weighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and L-moment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for right-tail and left-tail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TL-moments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances- covariances of LMs are provided. The Harri-Coble Tau34-squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for right-tail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], Eta-Mu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], Kappa-Mu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3-p log-Normal [L], Pearson Type III [L], Rayleigh [L], Rev-Gumbel [L+RC], Rice/Rician [L], Slash [TL], 3-p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample L-comoments (LCMs) are implemented to measure asymmetric associations.
562 Probability Distributions Lmoments L-Moments and Quantile Mixtures Contains functions to estimate L-moments and trimmed L-moments from the data. Also contains functions to estimate the parameters of the normal polynomial quantile mixture and the Cauchy polynomial quantile mixture from L-moments and trimmed L-moments.
563 Probability Distributions logitnorm Functions for the Logitnormal Distribution Density, distribution, quantile and random generation function for the logitnormal distribution. Estimation of the mode and the first two moments. Estimation of distribution parameters.
564 Probability Distributions loglognorm Double log normal distribution functions r,d,p,q functions for the double log normal distribution
565 Probability Distributions marg Approximate Marginal Inference for Regression-Scale Models Likelihood inference based on higher order approximations for linear nonnormal regression models.
566 Probability Distributions MASS Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
567 Probability Distributions mbbefd Maxwell Boltzmann Bose Einstein Fermi Dirac Distribution and Destruction Rate Modelling Distributions that are typically used for exposure rating in general insurance, in particular to price reinsurance contracts. The vignettes show code snippets to fit the distribution to empirical data.
568 Probability Distributions mc2d Tools for Two-Dimensional Monte-Carlo Simulations A complete framework to build and study Two-Dimensional Monte-Carlo simulations, aka Second-Order Monte-Carlo simulations. Also includes various distributions (pert, triangular, Bernoulli, empirical discrete and continuous).
569 Probability Distributions mclust Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
570 Probability Distributions MCMCpack Markov Chain Monte Carlo (MCMC) Package Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
571 Probability Distributions mded Measuring the Difference Between Two Empirical Distributions Provides a function for measuring the difference between two independent or non-independent empirical distributions and returning a significance level of the difference.
572 Probability Distributions MEPDF Creation of Empirical Density Functions Based on Multivariate Data Based on the input data an n-dimensional cube with sub cells of user specified side length is created. The number of sample points which fall in each sub cube is counted, and with the cell volume and overall sample size an empirical probability can be computed. A number of cubes of higher resolution can be superimposed. The basic method stems from J.L. Bentley in “Multidimensional Divide and Conquer”. J. L. Bentley (1980) <doi:10.1145/358841.358850>. Furthermore a simple kernel density estimation method is made available, as well as an expansion of Bentleys method, which offers a kernel approach for the grid method.
573 Probability Distributions mgpd mgpd: Functions for multivariate generalized Pareto distribution (MGPD of Type II) Extends distribution and density functions to parametric multivariate generalized Pareto distributions (MGPD of Type II), and provides fitting functions which calculate maximum likelihood estimates for bivariate and trivariate models. (Help is under progress)
574 Probability Distributions minimax Minimax distribution family The minimax family of distributions is a two-parameter family like the beta family, but computationally a lot more tractible.
575 Probability Distributions MitISEM Mixture of Student t Distributions using Importance Sampling and Expectation Maximization Flexible multivariate function approximation using adapted Mixture of Student t Distributions. Mixture of t distribution is obtained using Importance Sampling weighted Expectation Maximization algorithm.
576 Probability Distributions MittagLeffleR Mittag-Leffler Family of Distributions Implements the Mittag-Leffler function, distribution, random variate generation, and estimation. Based on the Laplace-Inversion algorithm by Garrappa, R. (2015) <doi:10.1137/140971191>.
577 Probability Distributions MixedTS Mixed Tempered Stable Distribution We provide detailed functions for univariate Mixed Tempered Stable distribution.
578 Probability Distributions mixtools Tools for Analyzing Finite Mixture Models Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
579 Probability Distributions MM The Multiplicative Multinomial Distribution Various utilities for the Multiplicative Multinomial distribution.
580 Probability Distributions mnormpow Multivariate Normal Distributions with Power Integrand Computes integral of f(x)*x_i^k on a product of intervals, where f is the density of a gaussian law. This a is small alteration of the mnormt code from A. Genz and A. Azzalini.
581 Probability Distributions mnormt (core) The Multivariate Normal and t Distributions Functions are provided for computing the density and the distribution function of multivariate normal and “t” random variables, and for generating random vectors sampled from these distributions. Probabilities are computed via non-Monte Carlo methods; different routines are used in the case d=1, d=2, d>2, if d denotes the number of dimensions.
582 Probability Distributions modeest Mode Estimation Provides estimators of the mode of univariate data or univariate distributions.
583 Probability Distributions moments Moments, cumulants, skewness, kurtosis and related tests Functions to calculate: moments, Pearson’s kurtosis, Geary’s kurtosis and skewness; tests related to them (Anscombe-Glynn, D’Agostino, Bonett-Seier).
584 Probability Distributions movMF Mixtures of von Mises-Fisher Distributions Fit and simulate mixtures of von Mises-Fisher distributions.
585 Probability Distributions msm Multi-State Markov and Hidden Markov Models in Continuous Time Functions for fitting continuous-time Markov and hidden Markov multi-state models to longitudinal data. Designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewise-constant in time.
586 Probability Distributions MultiRNG Multivariate Pseudo-Random Number Generation Pseudo-random number generation for 11 multivariate distributions: Normal, t, Uniform, Bernoulli, Hypergeometric, Beta (Dirichlet), Multinomial, Dirichlet-Multinomial, Laplace, Wishart, and Inverted Wishart. The details of the method are explained in Demirtas (2004) <doi:10.22237/jmasm/1099268340>.
587 Probability Distributions mvprpb Orthant Probability of the Multivariate Normal Distribution Computes orthant probabilities multivariate normal distribution.
588 Probability Distributions mvrtn Mean and Variance of Truncated Normal Distribution Mean, variance, and random variates for left/right truncated normal distributions.
589 Probability Distributions mvtnorm (core) Multivariate Normal and t Distributions Computes multivariate normal and t probabilities, quantiles, random deviates and densities.
590 Probability Distributions nCDunnett Noncentral Dunnett’s Test Distribution Computes the noncentral Dunnett’s test distribution (pdf, cdf and quantile) and generates random numbers.
591 Probability Distributions nCopula Hierarchical Archimedean Copulas Constructed with Multivariate Compound Distributions Construct and manipulate hierarchical Archimedean copulas with multivariate compound distributions. The model used is the one of Cossette et al. (2017) <doi:10.1016/j.insmatheco.2017.06.001>.
592 Probability Distributions Newdistns Computes Pdf, Cdf, Quantile and Random Numbers, Measures of Inference for 19 General Families of Distributions Computes the probability density function, cumulative distribution function, quantile function, random numbers and measures of inference for the following general families of distributions (each family defined in terms of an arbitrary cdf G): Marshall Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncated-exponential skew-symmetric G distributions, modified beta G distributions, and exponentiated exponential Poisson G distributions.
593 Probability Distributions nor1mix Normal (1-d) Mixture Models (S3 Classes and Methods) Onedimensional Normal Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used Marron-Wand densities. Efficient random number generation and graphics; now fitting to data by ML (Maximum Likelihood) or EM estimation.
594 Probability Distributions NormalGamma Normal-gamma convolution model The functions proposed in this package compute the density of the sum of a Gaussian and a gamma random variables, estimate the parameters and correct the noise effect in a gamma-signal and Gaussian-noise model. This package has been used to implement the background correction method for Illumina microarray data presented in Plancade S., Rozenholc Y. and Lund E. “Generalization of the normal-exponential model : exploration of a more accurate parameterization for the signal distribution on Illumina BeadArrays”, BMC Bioinfo 2012, 13(329).
595 Probability Distributions NormalLaplace The Normal Laplace Distribution Functions for the normal Laplace distribution. The package is under development and provides only limited functionality. Density, distribution and quantile functions, random number generation, and moments are provided.
596 Probability Distributions normalp Routines for Exponential Power Distribution Collection of utilities referred to Exponential Power distribution, also known as General Error Distribution (see Mineo, A.M. and Ruggieri, M. (2005), A software Tool for the Exponential Power Distribution: The normalp package. In Journal of Statistical Software, Vol. 12, Issue 4).
597 Probability Distributions ORDER2PARENT Estimate parent distributions with data of several order statistics This package uses B-spline based nonparametric smooth estimators to estimate parent distributions given observations on multiple order statistics.
598 Probability Distributions OrdNor Concurrent Generation of Ordinal and Normal Data with Given Correlation Matrix and Marginal Distributions Implementation of a procedure for generating samples from a mixed distribution of ordinal and normal random variables with pre-specified correlation matrix and marginal distributions.
599 Probability Distributions ParetoPosStable Computing, Fitting and Validating the PPS Distribution Statistical functions to describe a Pareto Positive Stable (PPS) distribution and fit it to real data. Graphical and statistical tools to validate the fits are included.
600 Probability Distributions pbv Probabilities for Bivariate Normal Distribution Computes probabilities of the bivariate normal distribution in a vectorized R function (Drezner & Wesolowsky, 1990, <doi:10.1080/00949659008811236>).
601 Probability Distributions PDQutils PDQ Functions via Gram Charlier, Edgeworth, and Cornish Fisher Approximations A collection of tools for approximating the ‘PDQ’ functions (respectively, the cumulative distribution, density, and quantile) of probability distributions via classical expansions involving moments and cumulants.
602 Probability Distributions PearsonDS (core) Pearson Distribution System Implementation of the Pearson distribution system, including full support for the (d,p,q,r)-family of functions for probability distributions and fitting via method of moments and maximum likelihood method.
603 Probability Distributions PhaseType Inference for Phase-type Distributions Functions to perform Bayesian inference on absorption time data for Phase-type distributions. Plans to expand this to include frequentist inference and simulation tools.
604 Probability Distributions pmultinom One-Sided Multinomial Probabilities Implements multinomial CDF (P(N1<=n1, …, Nk<=nk)) and tail probabilities (P(N1>n1, …, Nk>nk)), as well as probabilities with both constraints (P(l1<N1<=u1, …, lk<Nk<=uk)). Uses a method suggested by Bruce Levin (1981) <doi:10.1214/aos/1176345593>.
605 Probability Distributions poibin The Poisson Binomial Distribution Implementation of both the exact and approximation methods for computing the cdf of the Poisson binomial distribution. It also provides the pmf, quantile function, and random number generation for the Poisson binomial distribution.
606 Probability Distributions poilog Poisson lognormal and bivariate Poisson lognormal distribution Functions for obtaining the density, random deviates and maximum likelihood estimates of the Poisson lognormal distribution and the bivariate Poisson lognormal distribution.
607 Probability Distributions poistweedie Poisson-Tweedie exponential family models Simulation of models Poisson-Tweedie.
608 Probability Distributions polyaAeppli Implementation of the Polya-Aeppli distribution Functions for evaluating the mass density, cumulative distribution function, quantile function and random variate generation for the Polya-Aeppli distribution, also known as the geometric compound Poisson distribution.
609 Probability Distributions poweRlaw Analysis of Heavy Tailed Distributions An implementation of maximum likelihood estimators for a variety of heavy tailed distributions, including both the discrete and continuous power law distributions. Additionally, a goodness-of-fit based approach is used to estimate the lower cut-off for the scaling region.
610 Probability Distributions probhat Generalized Kernel Smoothing Computes nonparametric probability distributions (probability density functions, cumulative distribution functions and quantile functions) using kernel smoothing. Supports univariate, multivariate and conditional distributions, and weighted data (possibly useful mixed with fuzzy clustering or frequency data). Also, supports empirical continuous cumulative distribution functions and their inverses, and random number generation.
611 Probability Distributions qmap Statistical Transformations for Post-Processing Climate Model Output Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.
612 Probability Distributions QRM Provides R-Language Code to Examine Quantitative Risk Management Concepts Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.
613 Probability Distributions qrmtools Tools for Quantitative Risk Management Functions and data sets for reproducing selected results from the book “Quantitative Risk Management: Concepts, Techniques and Tools”. Furthermore, new developments and auxiliary functions for Quantitative Risk Management practice.
614 Probability Distributions randaes Random number generator based on AES cipher The deterministic part of the Fortuna cryptographic pseudorandom number generator, described by Schneier & Ferguson “Practical Cryptography”
615 Probability Distributions random True Random Numbers using RANDOM.ORG The true random number service provided by the RANDOM.ORG website created by Mads Haahr samples atmospheric noise via radio tuned to an unused broadcasting frequency together with a skew correction algorithm due to John von Neumann. More background is available in the included vignette based on an essay by Mads Haahr. In its current form, the package offers functions to retrieve random integers, randomized sequences and random strings.
616 Probability Distributions randtoolbox Toolbox for Pseudo and Quasi Random Number Generation and Random Generator Tests Provides (1) pseudo random generators - general linear congruential generators, multiple recursive generators and generalized feedback shift register (SF-Mersenne Twister algorithm and WELL generators); (2) quasi random generators - the Torus algorithm, the Sobol sequence, the Halton sequence (including the Van der Corput sequence) and (3) some generator tests - the gap test, the serial test, the poker test. See e.g. Gentle (2003) <doi:10.1007/b97336>. The package can be provided without the rngWELL dependency on demand. Take a look at the Distribution task view of types and tests of random number generators. Version in Memoriam of Diethelm and Barbara Wuertz.
617 Probability Distributions RDieHarder R Interface to the ‘DieHarder’ RNG Test Suite The ‘RDieHarder’ package provides an R interface to the ‘DieHarder’ suite of random number generators and tests that was developed by Robert G. Brown and David Bauer, extending earlier work by George Marsaglia and others. The ‘DieHarder’ library is included, but if a version is already installed it will be used instead.
618 Probability Distributions ReIns Functions from “Reinsurance: Actuarial and Statistical Aspects” Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels <http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470772689.html>.
619 Probability Distributions reliaR (core) Package for some probability distributions A collection of utilities for some reliability models/probability distributions.
620 Probability Distributions Renext Renewal Method for Extreme Values Extrapolation Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a Maximum-Likelihood framework.
621 Probability Distributions retimes Reaction Time Analysis Reaction time analysis by maximum likelihood
622 Probability Distributions revdbayes Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.r-project.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the K-gaps model of Suveges and Davison (2010) <doi:10.1214/09-AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.
623 Probability Distributions rlecuyer R Interface to RNG with Multiple Streams Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.
624 Probability Distributions RMKdiscrete Sundry Discrete Probability Distributions Sundry discrete probability distributions and helper functions.
625 Probability Distributions RMTstat Distributions, Statistics and Tests derived from Random Matrix Theory Functions for working with the Tracy-Widom laws and other distributions related to the eigenvalues of large Wishart matrices. The tables for computing the Tracy-Widom densities and distribution functions were computed by Momar Dieng’s MATLAB package “RMLab” (formerly available on his homepage at http://math.arizona.edu/~momar/research.htm ). This package is part of a collaboration between Iain Johnstone, Zongming Ma, Patrick Perry, and Morteza Shahram. It will soon be replaced by a package with more accuracy and built-in support for relevant statistical tests.
626 Probability Distributions rngwell19937 Random number generator WELL19937a with 53 or 32 bit output Long period linear random number generator WELL19937a by F. Panneton, P. L’Ecuyer and M. Matsumoto. The initialization algorithm allows to seed the generator with a numeric vector of an arbitrary length and uses MRG32k5a by P. L’Ecuyer to achieve good quality of the initialization. The output function may be set to provide numbers from the interval (0,1) with 53 (the default) or 32 random bits. WELL19937a is of similar type as Mersenne Twister and has the same period. WELL19937a is slightly slower than Mersenne Twister, but has better equidistribution and “bit-mixing” properties and faster recovery from states with prevailing zeros than Mersenne Twister. All WELL generators with orders 512, 1024, 19937 and 44497 can be found in randtoolbox package.
627 Probability Distributions rstream Streams of Random Numbers Unified object oriented interface for multiple independent streams of random numbers from different sources.
628 Probability Distributions RTDE Robust Tail Dependence Estimation Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and bias-corrected estimation of the coefficient of tail dependence’ and ‘Robust and bias-corrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS-6416).
629 Probability Distributions rtdists Response Time Distributions Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model (Ratcliff & McKoon, 2008, <doi:10.1162/neco.2008.12-06-420>) based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA; Brown & Heathcote, 2008, <doi:10.1016/j.cogpsych.2007.12.002>) with different distributions underlying the drift rate.
630 Probability Distributions Runuran R Interface to the ‘UNU.RAN’ Random Variate Generators Interface to the ‘UNU.RAN’ library for Universal Non-Uniform RANdom variate generators. Thus it allows to build non-uniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.
631 Probability Distributions rust Ratio-of-Uniforms Simulation with Transformation Uses the generalized ratio-of-uniforms (RU) method to simulate from univariate and (low-dimensional) multivariate continuous distributions. The user specifies the log-density, up to an additive constant. The RU algorithm is applied after relocation of mode of the density to zero, and the user can choose a tuning parameter r. For details see Wakefield, Gelfand and Smith (1991) <doi:10.1007/BF01889987>, Efficient generation of random variates via the ratio-of-uniforms method, Statistics and Computing (1991) 1, 129-133. A Box-Cox variable transformation can be used to make the input density suitable for the RU method and to improve efficiency. In the multivariate case rotation of axes can also be used to improve efficiency. From version 1.2.0 the ‘Rcpp’ package <https://cran.r-project.org/package=Rcpp> can be used to improve efficiency.
632 Probability Distributions s20x Functions for University of Auckland Course STATS 201/208 Data Analysis A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors.
633 Probability Distributions sadists Some Additional Distributions Provides the density, distribution, quantile and generation functions of some obscure probability distributions, including the doubly non- central t, F, Beta, and Eta distributions; the lambda-prime and K-prime; the upsilon distribution; the (weighted) sum of non-central chi-squares to a power; the (weighted) sum of log non-central chi-squares; the product of non-central chi-squares to powers; the product of doubly non-central F variables; the product of independent normals.
634 Probability Distributions SCI Standardized Climate Indices Such as SPI, SRI or SPEI Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).
635 Probability Distributions setRNG Set (Normal) Random Number Generator and Seed SetRNG provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.
636 Probability Distributions sfsmisc Utilities from ‘Seminar fuer Statistik’ ETH Zurich Useful utilities [‘goodies’] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990’s. For graphics, have pretty (Log-scale) axes, an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a ‘tachoPlot()’, pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides ’Sys.*()’ functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().
637 Probability Distributions sgt Skewed Generalized T Distribution Tree Density, distribution function, quantile function and random generation for the skewed generalized t distribution. This package also provides a function that can fit data to the skewed generalized t distribution using maximum likelihood estimation.
638 Probability Distributions skellam Densities and Sampling for the Skellam Distribution Functions for the Skellam distribution, including: density (pmf), cdf, quantiles and regression.
639 Probability Distributions SkewHyperbolic The Skew Hyperbolic Student t-Distribution Functions are provided for the density function, distribution function, quantiles and random number generation for the skew hyperbolic t-distribution. There are also functions that fit the distribution to data. There are functions for the mean, variance, skewness, kurtosis and mode of a given distribution and to calculate moments of any order about any centre. To assess goodness of fit, there are functions to generate a Q-Q plot, a P-P plot and a tail plot.
640 Probability Distributions skewt The Skewed Student-t Distribution Density, distribution function, quantile function and random generation for the skewed t distribution of Fernandez and Steel.
641 Probability Distributions sld Estimation and Use of the Quantile-Based Skew Logistic Distribution The skew logistic distribution is a quantile-defined generalisation of the logistic distribution (van Staden and King 2015). Provides random numbers, quantiles, probabilities, densities and density quantiles for the distribution. It provides Quantile-Quantile plots and method of L-Moments estimation (including asymptotic standard errors) for the distribution.
642 Probability Distributions smoothmest Smoothed M-estimators for 1-dimensional location Some M-estimators for 1-dimensional location (Bisquare, ML for the Cauchy distribution, and the estimators from application of the smoothing principle introduced in Hampel, Hennig and Ronchetti (2011) to the above, the Huber M-estimator, and the median, main function is smoothm), and Pitman estimator.
643 Probability Distributions SMR Externally Studentized Midrange Distribution Computes the studentized midrange distribution (pdf, cdf and quantile) and generates random numbers
644 Probability Distributions sn The Skew-Normal and Related Distributions Such as the Skew-t Build and manipulate probability distributions of the skew-normal family and some related ones, notably the skew-t family, and provide related statistical methods for data fitting and model diagnostics, in the univariate and the multivariate case.
645 Probability Distributions sparseMVN Multivariate Normal Functions for Sparse Covariance and Precision Matrices Computes multivariate normal (MVN) densities, and samples from MVN distributions, when the covariance or precision matrix is sparse.
646 Probability Distributions spd Semi Parametric Distribution The Semi Parametric Piecewise Distribution blends the Generalized Pareto Distribution for the tails with a kernel based interior.
647 Probability Distributions stabledist Stable Distribution Functions Density, Probability and Quantile functions, and random number generation for (skew) stable distributions, using the parametrizations of Nolan.
648 Probability Distributions STAR Spike Train Analysis with R Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.
649 Probability Distributions statmod Statistical Modeling A collection of algorithms and functions to aid statistical modeling. Includes growth curve comparisons, limiting dilution analysis (aka ELDA), mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Includes advanced generalized linear model functions that implement secure convergence, dispersion modeling and Tweedie power-law families.
650 Probability Distributions SuppDists Supplementary Distributions Ten distributions supplementing those built into R. Inverse Gauss, Kruskal-Wallis, Kendall’s Tau, Friedman’s chi squared, Spearman’s rho, maximum F ratio, the Pearson product moment correlation coefficient, Johnson distributions, normal scores and generalized hypergeometric distributions. In addition two random number generators of George Marsaglia are included.
651 Probability Distributions symmoments Symbolic central and noncentral moments of the multivariate normal distribution Symbolic central and non-central moments of the multivariate normal distribution. Computes a standard representation, LateX code, and values at specified mean and covariance matrices.
652 Probability Distributions tmvtnorm Truncated Multivariate Normal and Student t Distribution Random number generation for the truncated multivariate normal and Student t distribution. Computes probabilities, quantiles and densities, including one-dimensional and bivariate marginal densities. Computes first and second moments (i.e. mean and covariance matrix) for the double-truncated multinormal case.
653 Probability Distributions tolerance Statistical Tolerance Intervals and Regions Statistical tolerance limits provide the limits between which we can expect to find a specified proportion of a sampled population with a given level of confidence. This package provides functions for estimating tolerance limits (intervals) for various univariate distributions (binomial, Cauchy, discrete Pareto, exponential, two-parameter exponential, extreme value, hypergeometric, Laplace, logistic, negative binomial, negative hypergeometric, normal, Pareto, Poisson-Lindley, Poisson, uniform, and Zipf-Mandelbrot), Bayesian normal tolerance limits, multivariate normal tolerance regions, nonparametric tolerance intervals, tolerance bands for regression settings (linear regression, nonlinear regression, nonparametric regression, and multivariate regression), and analysis of variance tolerance intervals. Visualizations are also available for most of these settings.
654 Probability Distributions trapezoid The Trapezoidal Distribution The trapezoid package provides dtrapezoid, ptrapezoid, qtrapezoid, and rtrapezoid functions for the trapezoidal distribution.
655 Probability Distributions triangle Provides the Standard Distribution Functions for the Triangle Distribution Provides the “r, q, p, and d” distribution functions for the triangle distribution.
656 Probability Distributions truncnorm Truncated Normal Distribution Density, probability, quantile and random number generation functions for the truncated normal distribution.
657 Probability Distributions TSA Time Series Analysis Contains R functions and datasets detailed in the book “Time Series Analysis with Applications in R (second edition)” by Jonathan Cryer and Kung-Sik Chan.
658 Probability Distributions tsallisqexp Tsallis q-Exp Distribution Tsallis distribution also known as the q-exponential family distribution. Provide distribution d, p, q, r functions, fitting and testing functions. Project initiated by Paul Higbie and based on Cosma Shalizi’s code.
659 Probability Distributions TTmoment Sampling and Calculating the First and Second Moments for the Doubly Truncated Multivariate t Distribution Computing the first two moments of the truncated multivariate t (TMVT) distribution under the double truncation. Appling the slice sampling algorithm to generate random variates from the TMVT distribution.
660 Probability Distributions tweedie Evaluation of Tweedie Exponential Family Models Maximum likelihood computations for Tweedie families, including the series expansion (Dunn and Smyth, 2005; <doi10.1007/s11222-005-4070-y>) and the Fourier inversion (Dunn and Smyth, 2008; <doi:10.1007/s11222-007-9039-6>), and related methods.
661 Probability Distributions UnivRNG Univariate Pseudo-Random Number Generation Pseudo-random number generation of 17 univariate distributions.
662 Probability Distributions VarianceGamma The Variance Gamma Distribution Provides functions for the variance gamma distribution. Density, distribution and quantile functions. Functions for random number generation and fitting of the variance gamma to data. Also, functions for computing moments of the variance gamma distribution of any order about any location. In addition, there are functions for checking the validity of parameters and to interchange different sets of parameterizations for the variance gamma distribution.
663 Probability Distributions VGAM (core) Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
664 Probability Distributions VineCopula Statistical Inference of Vine Copulas Provides tools for the statistical analysis of vine copula models. The package includes tools for parameter estimation, model selection, simulation, goodness-of-fit tests, and visualization. Tools for estimation, selection and exploratory data analysis of bivariate copula models are also provided.
665 Probability Distributions vines Multivariate Dependence Modeling with Vines Implementation of the vine graphical model for building high-dimensional probability distributions as a factorization of bivariate copulas and marginal density functions. This package provides S4 classes for vines (C-vines and D-vines) and methods for inference, goodness-of-fit tests, density/distribution function evaluation, and simulation.
666 Probability Distributions vistributions Visualize Probability Distributions Visualize and compute percentiles/probabilities of normal, t, f, chi square and binomial distributions.
667 Probability Distributions visualize Graph Probability Distributions with User Supplied Parameters and Statistics Graphs the pdf or pmf and highlights what area or probability is present in user defined locations. Visualize is able to provide lower tail, bounded, upper tail, and two tail calculations. Supports strict and equal to inequalities. Also provided on the graph is the mean and variance of the distribution.
668 Probability Distributions Wrapped Computes Pdf, Cdf, Quantile, Random Numbers and Provides Estimation for any Univariate Wrapped Distributions Computes the pdf, cdf, quantile, random numbers for any wrapped G distributions. Computes maximum likelihood estimates of the parameters, standard errors, 95 percent confidence intervals, value of Cramer-von Misses statistic, value of Anderson Darling statistic, value of Kolmogorov Smirnov test statistic and its \(p\)-value, value of Akaike Information Criterion, value of Consistent Akaike Information Criterion, value of Bayesian Information Criterion, value of Hannan-Quinn information criterion, minimum value of the negative log-likelihood function and convergence status when the wrapped distribution is fitted to some data.
669 Probability Distributions zipfextR Zipf Extended Distributions Implementation of three extensions of the Zipf distribution: the Marshall-Olkin Extended Zipf (MOEZipf) Perez-Casany, M., & Casellas, A. (2013) <arXiv:1304.4540>, the Zipf-Poisson Extreme (Zipf-PE) and the Zipf-Poisson Stopped Sum (Zipf-PSS) distributions. In log-log scale, the two first extensions allow for top-concavity and top-convexity while the third one only allows for top-concavity. All the extensions maintain the linearity associated with the Zipf model in the tail.
670 Probability Distributions zipfR Statistical Models for Word Frequency Distributions Statistical models and utilities for the analysis of word frequency distributions. The utilities include functions for loading, manipulating and visualizing word frequency data and vocabulary growth curves. The package also implements several statistical models for the distribution of word frequencies in a population. (The name of this package derives from the most famous word frequency distribution, Zipf’s law.)
671 Econometrics AER (core) Applied Econometrics with R Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York. ISBN 978-0-387-77316-2. (See the vignette “AER” for a package overview.)
672 Econometrics aod Analysis of Overdispersed Data Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).
673 Econometrics apt Asymmetric Price Transmission Asymmetric price transmission between two time series is assessed. Several functions are available for linear and nonlinear threshold cointegration, and furthermore, symmetric and asymmetric error correction model. A graphical user interface is also included for major functions included in the package, so users can also use these functions in a more intuitive way.
674 Econometrics bayesm Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
675 Econometrics betareg Beta Regression Beta regression for modeling beta-distributed dependent variables, e.g., rates and proportions. In addition to maximum likelihood regression (for both mean and precision of a beta-distributed response), bias-corrected and bias-reduced estimation as well as finite mixture models and recursive partitioning for beta regressions are provided.
676 Econometrics bife Binary Choice Models with Fixed Effects Estimates fixed effects binary choice models (logit and probit) with potentially many individual fixed effects and computes average partial effects. Incidental parameter bias can be reduced with a bias-correction proposed by Hahn and Newey (2004) <doi:10.1111/j.1468-0262.2004.00533.x>.
677 Econometrics BMA Bayesian Model Averaging Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).
678 Econometrics BMS Bayesian Model Averaging Library Bayesian model averaging for linear models with a wide choice of (customizable) priors. Built-in priors include coefficient priors (fixed, flexible and hyper-g priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Post-processing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison.
679 Econometrics boot Bootstrap Functions (Originally by Angelo Canty for S) Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.
680 Econometrics bootstrap Functions for the Book “An Introduction to the Bootstrap” Software (bootstrap, cross-validation, jackknife) and data for the book “An Introduction to the Bootstrap” by B. Efron and R. Tibshirani, 1993, Chapman and Hall. This package is primarily provided for projects already based on it, and for support of the book. New projects should preferentially use the recommended package “boot”.
681 Econometrics brglm Bias Reduction in Binomial-Response Generalized Linear Models Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as ‘glm’. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.
682 Econometrics CADFtest A Package to Perform Covariate Augmented Dickey-Fuller Unit Root Tests Hansen’s (1995) Covariate-Augmented Dickey-Fuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The p-values of the test are computed using the procedure illustrated in Lupi (2009).
683 Econometrics car (core) Companion to Applied Regression Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.
684 Econometrics CDNmoney Components of Canadian Monetary and Credit Aggregates Components of Canadian Credit Aggregates and Monetary Aggregates with continuity adjustments.
685 Econometrics censReg Censored Regression (Tobit) Models Maximum Likelihood estimation of censored regression (Tobit) models with cross-sectional and panel data.
686 Econometrics clubSandwich Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the bias-reduced linearization estimator introduced by Bell and McCaffrey (2002) <http://www.statcan.gc.ca/pub/12-001-x/2002002/article/9058-eng.pdf> and developed further by Pustejovsky and Tipton (2017) <doi:10.1080/07350015.2016.1247004>. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple- contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple- contrast hypotheses use an approximation to Hotelling’s T-squared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg (from package ‘AER’), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’).
687 Econometrics clusterSEs Calculate Cluster-Robust p-Values and Confidence Intervals Calculate p-values and confidence intervals using cluster-adjusted t-statistics (based on Ibragimov and Muller (2010) <doi:10.1198/jbes.2009.08046>, pairs cluster bootstrapped t-statistics, and wild cluster bootstrapped t-statistics (the latter two techniques based on Cameron, Gelbach, and Miller (2008) <doi:10.1162/rest.90.3.414>. Procedures are included for use with GLM, ivreg, plm (pooling or fixed effects), and mlogit models.
688 Econometrics crch Censored Regression with Conditional Heteroscedasticity Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by interval-censoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals.
689 Econometrics decompr Global-Value-Chain Decomposition Two global-value-chain decompositions are implemented. Firstly, the Wang-Wei-Zhu (Wang, Wei, and Zhu, 2013) algorithm splits bilateral gross exports into 16 value-added components. Secondly, the Leontief decomposition (default) derives the value added origin of exports by country and industry, which is also based on Wang, Wei, and Zhu (Wang, Z., S.-J. Wei, and K. Zhu. 2013. “Quantifying International Production Sharing at the Bilateral and Sector Levels.”).
690 Econometrics dlsem Distributed-Lag Linear Structural Equation Models Inference functionalities for distributed-lag linear structural equation models (DLSEMs). DLSEMs are Markovian structural causal models where each factor of the joint probability distribution is a distributed-lag linear regression model (Magrini, 2018) <doi:10.2478/bile-2018-0012>. DLSEMs account for temporal delays in the dependence relationships among the variables and allow to perform dynamic causal inference by assessing causal effects at different time lags. Endpoint-constrained quadratic, quadratic decreasing and gamma lag shapes are available.
691 Econometrics dynlm Dynamic Linear Regression Dynamic linear models and time series regression.
692 Econometrics Ecdat Data Sets for Econometrics Data sets for econometrics.
693 Econometrics effects Effect Displays for Linear, Generalized Linear, and Other Models Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.
694 Econometrics erer Empirical Research in Economics with R Functions, datasets, and sample codes related to the book of ‘Empirical Research in Economics: Growing up with R’ by Dr. Changyou Sun are included. Marginal effects for binary or ordered choice models can be calculated. Static and dynamic Almost Ideal Demand System (AIDS) models can be estimated. A typical event analysis in finance can be conducted with several functions included.
695 Econometrics estimatr Fast Estimators for Design-Based Inference Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, difference-in-means, Horvitz-Thompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>.
696 Econometrics expsmooth Data Sets from “Forecasting with Exponential Smoothing” Data sets from the book “Forecasting with exponential smoothing: the state space approach” by Hyndman, Koehler, Ord and Snyder (Springer, 2008).
697 Econometrics ExtremeBounds Extreme Bounds Analysis (EBA) An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer’s and Sala-i-Martin’s versions of EBA, and allows users to customize all aspects of the analysis.
698 Econometrics feisr Estimating Fixed Effects Individual Slope Models Provides the function feis() to estimate fixed effects individual slope (FEIS) models. The FEIS model constitutes a more general version of the often-used fixed effects (FE) panel model, as implemented in the package ‘plm’ by Croissant and Millo (2008) <doi:10.18637/jss.v027.i02>. In FEIS models, data are not only person “demeaned” like in conventional FE models, but “detrended” by the predicted individual slope of each person or group. Estimation is performed by applying least squares lm() to the transformed data. For more details on FEIS models see Bruederl and Ludwig (2015, ISBN:1446252442); Frees (2001) <doi:10.2307/3316008>; Polachek and Kim (1994) <doi:10.1016/0304-4076(94)90075-2>; Wooldridge (2010, ISBN:0262294354). To test consistency of conventional FE and random effects estimators against heterogeneous slopes, the package also provides the functions feistest() for an artificial regression test and bsfeistest() for a bootstrapped version of the Hausman test.
699 Econometrics fma Data Sets from “Forecasting: Methods and Applications” by Makridakis, Wheelwright & Hyndman (1998) All data sets from “Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman (Wiley, 3rd ed., 1998).
700 Econometrics forecast (core) Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
701 Econometrics frm Regression Analysis of Fractional Responses Estimation and specification analysis of one- and two-part fractional regression models and calculation of partial effects.
702 Econometrics frontier Stochastic Frontier Analysis Maximum Likelihood Estimation of Stochastic Frontier Production and Cost Functions. Two specifications are available: the error components specification with time-varying efficiencies (Battese and Coelli, 1992) and a model specification in which the firm effects are directly influenced by a number of variables (Battese and Coelli, 1995).
703 Econometrics fxregime Exchange Rate Regime Analysis Exchange rate regression and structural change tools for estimating, testing, dating, and monitoring (de facto) exchange rate regimes.
704 Econometrics gam Generalized Additive Models Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).
705 Econometrics gamlss Generalised Additive Models for Location Scale and Shape Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.
706 Econometrics geepack Generalized Estimating Equation Package Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses.
707 Econometrics gets General-to-Specific (GETS) Modelling and Indicator Saturation Methods Automated General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.
708 Econometrics glmx Generalized Linear Models Extended Extended techniques for generalized linear models (GLMs), especially for binary responses, including parametric links and heteroskedastic latent variables.
709 Econometrics gmm Generalized Method of Moments and Generalized Empirical Likelihood It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; <doi:10.2307/1392442>) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>).
710 Econometrics gmnl Multinomial Logit Models with Random Parameters An implementation of maximum simulated likelihood method for the estimation of multinomial logit models with random coefficients. Specifically, it allows estimating models with continuous heterogeneity such as the mixed multinomial logit and the generalized multinomial logit. It also allows estimating models with discrete heterogeneity such as the latent class and the mixed-mixed multinomial logit model.
711 Econometrics gvc Global Value Chains Tools Several tools for Global Value Chain (‘GVC’) analysis are implemented.
712 Econometrics Hmisc Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
713 Econometrics ineq Measuring Inequality, Concentration, and Poverty Inequality, concentration, and poverty measures. Lorenz curves (empirical and theoretical).
714 Econometrics intReg Interval Regression Estimating interval regression models. Supports both common and observation-specific boundaries.
715 Econometrics ivfixed Instrumental fixed effect panel data model Fit an Instrumental least square dummy variable model
716 Econometrics ivpack Instrumental Variable Estimation This package contains functions for carrying out instrumental variable estimation of causal effects and power analyses for instrumental variable studies.
717 Econometrics ivpanel Instrumental Panel Data Models Fit the instrumental panel data models: the fixed effects, random effects and between models.
718 Econometrics ivprobit Instrumental Variables Probit Model Compute the instrumental variables probit model using the Amemiya’s Generalized Least Squares estimators (Amemiya, Takeshi, (1978) <doi:10.2307/1911443>).
719 Econometrics LARF Local Average Response Functions for Instrumental Variable Estimation of Treatment Effects Provides instrumental variable estimation of treatment effects when both the endogenous treatment and its instrument are binary. Applicable to both binary and continuous outcomes.
720 Econometrics lavaan Latent Variable Analysis Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
721 Econometrics lfe Linear Group Fixed Effects Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multi-way clustered standard errors, as well as limited mobility bias correction.
722 Econometrics LinRegInteractive Interactive Interpretation of Linear Regression Models Interactive visualization of effects, response functions and marginal effects for different kinds of regression models. In this version linear regression models, generalized linear models, generalized additive models and linear mixed-effects models are supported. Major features are the interactive approach and the handling of the effects of categorical covariates: if two or more factors are used as covariates every combination of the levels of each factor is treated separately. The automatic calculation of marginal effects and a number of possibilities to customize the graphical output are useful features as well.
723 Econometrics lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
724 Econometrics lmtest (core) Testing Linear Regression Models A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.
725 Econometrics margins Marginal Effects for Model Objects An R port of Stata’s ‘margins’ command, which can be used to calculate marginal (or partial) effects from model objects.
726 Econometrics MASS Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
727 Econometrics matchingMarkets Analysis of Stable Matchings Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
728 Econometrics Matrix Sparse and Dense Matrix Classes and Methods A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using ‘LAPACK’ and ‘SuiteSparse’ libraries.
729 Econometrics Mcomp Data from the M-Competitions The 1001 time series from the M-competition (Makridakis et al. 1982) <doi:10.1002/for.3980010202> and the 3003 time series from the IJF-M3 competition (Makridakis and Hibon, 2000) <doi:10.1016/S0169-2070(00)00057-1>.
730 Econometrics meboot Maximum Entropy Bootstrap for Time Series Maximum entropy density based dependent data bootstrap. An algorithm is provided to create a population of time series (ensemble) without assuming stationarity. The reference paper (Vinod, H.D., 2004) explains how the algorithm satisfies the ergodic theorem and the central limit theorem.
731 Econometrics mgcv Mixed GAM Computation Vehicle with Automatic Smoothness Estimation Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.
732 Econometrics micEcon Microeconomic Analysis and Modelling Various tools for microeconomic analysis and microeconomic modelling, e.g. estimating quadratic, Cobb-Douglas and Translog functions, calculating partial derivatives and elasticities of these functions, and calculating Hessian matrices, checking curvature and preparing restrictions for imposing monotonicity of Translog functions.
733 Econometrics micEconAids Demand Analysis with the Almost Ideal Demand System (AIDS) Functions and tools for analysing consumer demand with the Almost Ideal Demand System (AIDS) suggested by Deaton and Muellbauer (1980).
734 Econometrics micEconCES Analysis with the Constant Elasticity of Substitution (CES) function Tools for economic analysis and economic modelling with a Constant Elasticity of Substitution (CES) function
735 Econometrics micEconSNQP Symmetric Normalized Quadratic Profit Function Production analysis with the Symmetric Normalized Quadratic (SNQ) profit function
736 Econometrics midasr Mixed Data Sampling Regression Methods and tools for mixed frequency time series data analysis. Allows estimation, model selection and forecasting for MIDAS regressions.
737 Econometrics mlogit Multinomial Logit Models Maximum Likelihood estimation of random utility discrete choice models (logit and probit).
738 Econometrics MNP R Package for Fitting the Multinomial Probit Model Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. <doi:10.18637/jss.v014.i03>.
739 Econometrics multiwayvcov Multi-Way Standard Error Clustering Exports two functions implementing multi-way clustering using the method suggested by Cameron, Gelbach, & Miller (2011) and cluster (or block) bootstrapping for estimating variance-covariance matrices. Normal one and two-way clustering matches the results of other common statistical packages. Missing values are handled transparently and rudimentary parallelization support is provided.
740 Econometrics mvProbit Multivariate Probit Models Tools for estimating multivariate probit models, calculating conditional and unconditional expectations, and calculating marginal effects on conditional and unconditional expectations.
741 Econometrics nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
742 Econometrics nnet Feed-Forward Neural Networks and Multinomial Log-Linear Models Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.
743 Econometrics nonnest2 Tests of Non-Nested Models Testing non-nested models via theory supplied by Vuong (1989) <doi:10.2307/1912557>. Includes tests of model distinguishability and of model fit that can be applied to both nested and non-nested models. Also includes functionality to obtain confidence intervals associated with AIC and BIC. This material is partially based on work supported by the National Science Foundation under Grant Number SES-1061334.
744 Econometrics np Nonparametric Kernel Smoothing Methods for Mixed Data Types Nonparametric (and semiparametric) kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types. We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC, <http://www.nserc-crsng.gc.ca>), the Social Sciences and Humanities Research Council of Canada (SSHRC, <http://www.sshrc-crsh.gc.ca>), and the Shared Hierarchical Academic Research Computing Network (SHARCNET, <http://www.sharcnet.ca>).
745 Econometrics nse Numerical Standard Errors Computation in R Collection of functions designed to calculate numerical standard error (NSE) of univariate time series as described in Ardia et al. (2018) <doi:10.2139/ssrn.2741587> and Ardia and Bluteau (2017) <doi:10.21105/joss.00172>.
746 Econometrics ordinal Regression Models for Ordinal Data Implementation of cumulative link (mixed) models also known as ordered regression models, proportional odds models, proportional hazards models for grouped survival times and ordered logit/probit/… models. Estimation is via maximum likelihood and mixed models are fitted with the Laplace approximation and adaptive Gauss-Hermite quadrature. Multiple random effect terms are allowed and they may be nested, crossed or partially nested/crossed. Restrictions of symmetry and equidistance can be imposed on the thresholds (cut-points/intercepts). Standard model methods are available (summary, anova, drop-methods, step, confint, predict etc.) in addition to profile methods and slice methods for visualizing the likelihood function and checking convergence.
747 Econometrics OrthoPanels Dynamic Panel Models with Orthogonal Reparameterization of Fixed Effects Implements the orthogonal reparameterization approach recommended by Lancaster (2002) to estimate dynamic panel models with fixed effects (and optionally: panel specific intercepts). The approach uses a likelihood-based estimator and produces estimates that are asymptotically unbiased as N goes to infinity, with a T as low as 2.
748 Econometrics pampe Implementation of the Panel Data Approach Method for Program Evaluation Implements the Panel Data Approach Method for program evaluation as developed in Hsiao, Ching and Ki Wan (2012). pampe estimates the effect of an intervention by comparing the evolution of the outcome for a unit affected by an intervention or treatment to the evolution of the unit had it not been affected by the intervention.
749 Econometrics panelAR Estimation of Linear AR(1) Panel Data Models with Cross-Sectional Heteroskedasticity and/or Correlation The package estimates linear models on panel data structures in the presence of AR(1)-type autocorrelation as well as panel heteroskedasticity and/or contemporaneous correlation. First, AR(1)-type autocorrelation is addressed via a two-step Prais-Winsten feasible generalized least squares (FGLS) procedure, where the autocorrelation coefficients may be panel-specific. A number of common estimators for the autocorrelation coefficient are supported. In case of panel heteroskedasticty, one can choose to use a sandwich-type robust standard error estimator with OLS or a panel weighted least squares estimator after the two-step Prais-Winsten estimator. Alternatively, if panels are both heteroskedastic and contemporaneously correlated, the package supports panel-corrected standard errors (PCSEs) as well as the Parks-Kmenta FGLS estimator.
750 Econometrics Paneldata Linear models for panel data Linear models for panel data: the fixed effect model and the random effect model
751 Econometrics panelvar Panel Vector Autoregression We extend two general methods of moment estimators to panel vector autoregression models (PVAR) with p lags of endogenous variables, predetermined and strictly exogenous variables. This general PVAR model contains the first difference GMM estimator by Holtz-Eakin et al. (1988) <doi:10.2307/1913103>, Arellano and Bond (1991) <doi:10.2307/2297968> and the system GMM estimator by Blundell and Bond (1998) <doi:10.1016/S0304-4076(98)00009-8>. We also provide specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions.
752 Econometrics PANICr PANIC Tests of Nonstationarity A methodology that makes use of the factor structure of large dimensional panels to understand the nature of nonstationarity inherent in data. This is referred to as PANIC, Panel Analysis of Nonstationarity in Idiosyncratic and Common Components. PANIC (2004)<doi:10.1111/j.1468-0262.2004.00528.x> includes valid pooling methods that allow panel tests to be constructed. PANIC (2004) can detect whether the nonstationarity in a series is pervasive, or variable specific, or both. PANIC (2010) <doi:10.1017/s0266466609990478> includes two new tests on the idiosyncratic component that estimates the pooled autoregressive coefficient and sample moment, respectively. The PANIC model approximates the number of factors based on Bai and Ng (2002) <doi:10.1111/1468-0262.00273>.
753 Econometrics pco Panel Cointegration Tests Computation of the Pedroni (1999) panel cointegration test statistics. Reported are the empirical and the standardized values.
754 Econometrics pcse Panel-Corrected Standard Error Estimation in R A function to estimate panel-corrected standard errors. Data may contain balanced or unbalanced panels.
755 Econometrics pder Panel Data Econometrics with R Data sets for the Panel Data Econometrics with R book.
756 Econometrics pdR Threshold Model and Unit Root Tests in Cross-section and Time Series Data Threshold model, panel version of Hylleberg et al. (1990) <doi:10.1016/0304-4076(90)90080-D> seasonal unit root tests, and panel unit root test of Chang (2002) <doi:10.1016/S0304-4076(02)00095-7>.
757 Econometrics pglm Panel Generalized Linear Models Estimation of panel models for glm-like models: this includes binomial models (logit and probit) count models (poisson and negbin) and ordered models (logit and probit).
758 Econometrics phtt Panel Data Analysis with Heterogeneous Time Trends The package provides estimation procedures for panel data with large dimensions n, T, and general forms of unobservable heterogeneous effects. Particularly, the estimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), which complement one another very well: both models assume the unobservable heterogeneous effects to have a factor structure. The method of Bai (2009) assumes that the factors are stationary, whereas the method of Kneip et al. (2012) allows the factors to be non-stationary. Additionally, the ‘phtt’ package provides a wide range of dimensionality criteria in order to estimate the number of the unobserved factors simultaneously with the remaining model parameters.
759 Econometrics plm (core) Linear Models for Panel Data A set of estimators and tests for panel data econometrics.
760 Econometrics pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
761 Econometrics psidR Build Panel Data Sets from PSID Raw Data Makes it easy to build panel data in wide format from Panel Survey of Income Dynamics (‘PSID’) delivered raw data. Downloads data directly from the PSID server using the ‘SAScii’ package. ‘psidR’ takes care of merging data from each wave onto a cross-period index file, so that individuals can be followed over time. The user must specify which years they are interested in, and the ‘PSID’ variable names (e.g. ER21003) for each year (they differ in each year). The package offers helper functions to retrieve variable names from different waves. There are different panel data designs and sample subsetting criteria implemented (“SRC”, “SEO”, “immigrant” and “latino” samples).
762 Econometrics PSTR Panel Smooth Transition Regression Modelling Provides the Panel Smooth Transition Regression (PSTR) modelling. The modelling procedure consists of three stages: Specification, Estimation and Evaluation. The package offers sharp tools helping the package user(s) to conduct model specification tests, to do PSTR model estimation, and to do model evaluation. The tests implemented in the package allow for cluster-dependency and are heteroskedasticity-consistent. The wild bootstrap and wild cluster bootstrap tests are also implemented. Parallel computation (as an option) is implemented in some functions, especially the bootstrap tests. The package suits tasks running many cores on super-computation servers.
763 Econometrics pwt Penn World Table (Versions 5.6, 6.x, 7.x) The Penn World Table provides purchasing power parity and national income accounts converted to international prices for 189 countries for some or all of the years 1950-2010.
764 Econometrics pwt8 Penn World Table (Version 8.x) The Penn World Table 8.x provides information on relative levels of income, output, inputs, and productivity for 167 countries between 1950 and 2011.
765 Econometrics pwt9 Penn World Table (Version 9.x) The Penn World Table 9.x provides information on relative levels of income, output, inputs, and productivity for 182 countries between 1950 and 2014.
766 Econometrics quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
767 Econometrics Rchoice Discrete Choice (Binary, Poisson and Ordered) Models with Random Parameters An implementation of simulated maximum likelihood method for the estimation of Binary (Probit and Logit), Ordered (Probit and Logit) and Poisson models with random parameters for cross-sectional and longitudinal data.
768 Econometrics rdd Regression Discontinuity Estimation Provides the tools to undertake estimation in Regression Discontinuity Designs. Both sharp and fuzzy designs are supported. Estimation is accomplished using local linear regression. A provided function will utilize Imbens-Kalyanaraman optimal bandwidth calculation. A function is also included to test the assumption of no-sorting effects.
769 Econometrics rddapp Regression Discontinuity Design Application Estimation of both single- and multiple-assignment Regression Discontinuity Designs (RDDs). Provides both parametric (global) and non-parametric (local) estimation choices for both sharp and fuzzy designs, along with power analysis and assumption checks. Introductions to the underlying logic and analysis of RDDs are in Thistlethwaite, D. L., Campbell, D. T. (1960) <doi:10.1037/h0044319> and Lee, D. S., Lemieux, T. (2010) <doi:10.1257/jel.48.2.281>.
770 Econometrics rddtools Toolbox for Regression Discontinuity Design (‘RDD’) Set of functions for Regression Discontinuity Design (‘RDD’), for data visualisation, estimation and testing.
771 Econometrics rdlocrand Local Randomization Methods for RD Designs The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. Under the local randomization approach, RD designs can be interpreted as randomized experiments inside a window around the cutoff. This package provides tools to perform randomization inference for RD designs under local randomization: rdrandinf() to perform hypothesis testing using randomization inference, rdwinselect() to select a window around the cutoff in which randomization is likely to hold, rdsensitivity() to assess the sensitivity of the results to different window lengths and null hypotheses and rdrbounds() to construct Rosenbaum bounds for sensitivity to unobserved confounders.
772 Econometrics rdmulti Analysis of RD Designs with Multiple Cutoffs or Scores The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The ‘rdmulti’ package provides tools to analyze RD designs with multiple cutoffs or scores: rdmc() estimates pooled and cutoff specific effects for multi-cutoff designs, rdmcplot() draws RD plots for multi-cutoff designs and rdms() estimates effects in cumulative cutoffs or multi-score designs. See Cattaneo, Titiunik and Vazquez-Bare (2018) <https://sites.google.com/site/rdpackages/rdmulti/Cattaneo-Titiunik-VazquezBare_2018_rdmulti.pdf> for further methodological details.
773 Econometrics rdpower Power Calculations for RD Designs The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The ‘rdpower’ package provides tools to perform power and sample size calculations in RD designs: rdpower() calculates the power of an RD design and rdsampsi() calculates the required sample size to achieve a desired power. See Cattaneo, Titiunik and Vazquez-Bare (2018) <https://sites.google.com/site/rdpackages/rdpower/Cattaneo-Titiunik-VazquezBare_2018_Stata.pdf> for further methodological details.
774 Econometrics rdrobust Robust Data-Driven Statistical Inference in Regression-Discontinuity Designs Regression-discontinuity (RD) designs are quasi-experimental research designs popular in social, behavioral and natural sciences. The RD design is usually employed to study the (local) causal effect of a treatment, intervention or policy. This package provides tools for data-driven graphical and analytical statistical inference in RD designs: rdrobust() to construct local-polynomial point estimators and robust confidence intervals for average treatment effects at the cutoff in Sharp, Fuzzy and Kink RD settings, rdbwselect() to perform bandwidth selection for the different procedures implemented, and rdplot() to conduct exploratory data analysis (RD plots).
775 Econometrics reldist Relative Distribution Methods Tools for the comparison of distributions. This includes nonparametric estimation of the relative distribution PDF and CDF and numerical summaries as described in “Relative Distribution Methods in the Social Sciences” by Mark S. Handcock and Martina Morris, Springer-Verlag, 1999, Springer-Verlag, ISBN 0387987789.
776 Econometrics REndo Fitting Linear Models with Endogenous Regressors using Latent Instrumental Variables Fits linear models with endogenous regressor using latent instrumental variable approaches. The methods included in the package are Lewbel’s (1997) <doi:10.2307/2171884> higher moments approach as well as Lewbel’s (2012) <doi:10.1080/07350015.2012.643126> heteroscedasticity approach, Park and Gupta’s (2012) <doi:10.1287/mksc.1120.0718> joint estimation method that uses Gaussian copula and Kim and Frees’s (2007) <doi:10.1007/s11336-007-9008-1> multilevel generalized method of moment approach that deals with endogeneity in a multilevel setting. These are statistical techniques to address the endogeneity problem where no external instrumental variables are needed. Note that with version 2.0.0 sweeping changes were introduced which greatly improve functionality and usability but break backwards compatibility.
777 Econometrics rms Regression Modeling Strategies Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. ‘rms’ is a collection of functions that assist with and streamline modeling. It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the Buckley-James multiple regression model for right-censored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. ‘rms’ works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.
778 Econometrics RSGHB Functions for Hierarchical Bayesian Estimation: A Flexible Approach Functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train’s Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train’s original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train’s chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
779 Econometrics rUnemploymentData Data and Functions for USA State and County Unemployment Data Contains data and visualization functions for USA unemployment data. Data comes from the US Bureau of Labor Statistics (BLS). State data is in ?df_state_unemployment and covers 2000-2013. County data is in ?df_county_unemployment and covers 1990-2013. Choropleth maps of the data can be generated with ?state_unemployment_choropleth() and ?county_unemployment_choropleth() respectively.
780 Econometrics sampleSelection Sample Selection Models Two-step and maximum likelihood estimation of Heckman-type sample selection models: standard sample selection models (Tobit-2), endogenous switching regression models (Tobit-5), sample selection models with binary dependent outcome variable, interval regression with sample selection (only ML estimation), and endogenous treatment effects models.
781 Econometrics sandwich (core) Robust Covariance Matrix Estimators Model-robust standard error estimators for cross-sectional, time series, clustered, panel, and longitudinal data.
782 Econometrics segmented Regression Models with Break-Points / Change-Points Estimation Given a regression model, segmented ‘updates’ the model by adding one or more segmented (i.e., piece-wise linear) relationships. Several variables with multiple breakpoints are allowed.
783 Econometrics sem Structural Equation Models Functions for fitting general linear structural equation models (with observed and latent variables) using the RAM approach, and for fitting structural equations in observed-variable models by two-stage least squares.
784 Econometrics SemiParSampleSel Semi-Parametric Sample Selection Modelling with Continuous or Discrete Response Routine for fitting continuous or discrete response copula sample selection models with semi-parametric predictors, including linear and nonlinear effects.
785 Econometrics semsfa Semiparametric Estimation of Stochastic Frontier Models Semiparametric Estimation of Stochastic Frontier Models following a two step procedure: in the first step semiparametric or nonparametric regression techniques are used to relax parametric restrictions of the functional form representing technology and in the second step variance parameters are obtained by pseudolikelihood estimators or by method of moments.
786 Econometrics sfa Stochastic Frontier Analysis Stochastic Frontier Analysis introduced by Aigner, Lovell and Schmidt (1976) and Battese and Coelli (1992, 1995).
787 Econometrics simpleboot Simple Bootstrap Routines Simple bootstrap routines.
788 Econometrics SparseM Sparse Linear Algebra Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.
789 Econometrics spatialprobit Spatial Probit Models Bayesian Estimation of Spatial Probit and Tobit Models.
790 Econometrics spatialreg Spatial Regression Analysis A collection of all the estimation functions for spatial cross-sectional models (on lattice/areal data using spatial weights matrices) contained up to now in ‘spdep’, ‘sphet’ and ‘spse’. These model fitting functions include maximum likelihood methods for cross-sectional models proposed by ‘Cliff’ and ‘Ord’ (1973, ISBN:0850860369) and (1981, ISBN:0850860814), fitting methods initially described by ‘Ord’ (1975) <doi:10.1080/01621459.1975.10480272>. The models are further described by ‘Anselin’ (1988) <doi:10.1007/978-94-015-7799-1>. Spatial two stage least squares and spatial general method of moment models initially proposed by ‘Kelejian’ and ‘Prucha’ (1998) <doi:10.1023/A:1007707430416> and (1999) <doi:10.1111/1468-2354.00027> are provided. Impact methods and MCMC fitting methods proposed by ‘LeSage’ and ‘Pace’ (2009) <doi:10.1201/9781420064254> are implemented for the family of cross-sectional spatial regression models. Methods for fitting the log determinant term in maximum likelihood and MCMC fitting are compared by ‘Bivand et al.’ (2013) <doi:10.1111/gean.12008>, and model fitting methods by ‘Bivand’ and ‘Piras’ (2015) <doi:10.18637/jss.v063.i18>; both of these articles include extensive lists of references. ‘spatialreg’ >= 1.1-* correspond to ‘spdep’ >= 1.1-1, in which the model fitting functions are deprecated and pass through to ‘spatialreg’, but will mask those in ‘spatialreg’. From versions 1.2-*, the functions will be made defunct in ‘spdep’.
791 Econometrics spfrontier Spatial Stochastic Frontier Models A set of tools for estimation of various spatial specifications of stochastic frontier models.
792 Econometrics sphet Estimation of Spatial Autoregressive Models with and without Heteroscedasticity Generalized Method of Moment estimation of Cliff-Ord-type spatial autoregressive models with and without Heteroscedasticity.
793 Econometrics splm Econometric Models for Spatial Panel Data ML and GM estimation and diagnostic testing of econometric models for spatial panel data.
794 Econometrics ssfa Spatial Stochastic Frontier Analysis Spatial Stochastic Frontier Analysis (SSFA) is an original method for controlling the spatial heterogeneity in Stochastic Frontier Analysis (SFA) models, for cross-sectional data, by splitting the inefficiency term into three terms: the first one related to spatial peculiarities of the territory in which each single unit operates, the second one related to the specific production features and the third one representing the error term.
795 Econometrics strucchange Testing, Monitoring, and Dating Structural Changes Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
796 Econometrics survival Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
797 Econometrics systemfit Estimating Systems of Simultaneous Equations Econometric estimation of simultaneous systems of linear and nonlinear equations using Ordinary Least Squares (OLS), Weighted Least Squares (WLS), Seemingly Unrelated Regressions (SUR), Two-Stage Least Squares (2SLS), Weighted Two-Stage Least Squares (W2SLS), and Three-Stage Least Squares (3SLS).
798 Econometrics truncreg Truncated Gaussian Regression Models Estimation of models for truncated Gaussian variables by maximum likelihood.
799 Econometrics tsDyn Nonlinear Time Series Models with Regime Switching Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).
800 Econometrics tseries (core) Time Series Analysis and Computational Finance Time series analysis and computational finance.
801 Econometrics tsfa Time Series Factor Analysis Extraction of Factors from Multivariate Time Series. See ?00tsfa-Intro for more details.
802 Econometrics urca (core) Unit Root and Cointegration Tests for Time Series Data Unit root and cointegration tests encountered in applied econometric analysis are implemented.
803 Econometrics vars VAR Modelling Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.
804 Econometrics VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
805 Econometrics wahc Autocorrelation and Heteroskedasticity Correction in Fixed Effect Panel Data Model Fit the fixed effect panel data model with heteroskedasticity and autocorrelation correction.
806 Econometrics wbstats Programmatic Access to Data and Statistics from the World Bank API Tools for searching and downloading data and statistics from the World Bank Data API (<http://data.worldbank.org/developers/api-overview>) and the World Bank Data Catalog API (<http://data.worldbank.org/developers/data-catalog-api>).
807 Econometrics wooldridge 111 Data Sets from “Introductory Econometrics: A Modern Approach, 6e” by Jeffrey M. Wooldridge Students learning both econometrics and R may find the introduction to both challenging. However, if the text is “Introductory Econometrics: A Modern Approach” by Jeffrey M. Wooldridge, they are in luck! The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have all been compressed to a fraction of their original size and are well documented. Documentation files contain the page numbers of the text where each set is used, the original source, time of publication, and notes suggesting ideas for further exploratory data analysis and research. If one need’s to brush-up on model syntax, a vignette contains R solutions to examples from each chapter of the text. Data sets are from the 6th edition (Wooldridge 2016, ISBN-13: 978-1-305-27010-7), and are backwards compatible with all versions of the text.
808 Econometrics xts eXtensible Time Series Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
809 Econometrics Zelig Everyone’s Statistical Software A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
810 Econometrics zoo (core) S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
811 Econometrics zTree Functions to Import Data from ‘z-Tree’ into R Read ‘.xls’ and ‘.sbj’ files which are written by the Microsoft Windows program ‘z-Tree’. The latter is a software for developing and carrying out economic experiments (see <http://www.ztree.uzh.ch/> for more information).
812 Analysis of Ecological and Environmental Data ade4 (core) Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
813 Analysis of Ecological and Environmental Data amap Another Multidimensional Analysis Package Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).
814 Analysis of Ecological and Environmental Data analogue Analogue and Weighted Averaging Methods for Palaeoecology Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.
815 Analysis of Ecological and Environmental Data aod Analysis of Overdispersed Data Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).
816 Analysis of Ecological and Environmental Data ape Analyses of Phylogenetics and Evolution Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
817 Analysis of Ecological and Environmental Data aqp Algorithms for Quantitative Pedology The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps/>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.
818 Analysis of Ecological and Environmental Data BiodiversityR Package for Community Ecology and Suitability Analysis Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.
819 Analysis of Ecological and Environmental Data boussinesq Analytic Solutions for (ground-water) Boussinesq Equation This package is a collection of R functions implemented from published and available analytic solutions for the One-Dimensional Boussinesq Equation (ground-water). In particular, the function “beq.lin” is the analytic solution of the linearized form of Boussinesq Equation between two different head-based boundary (Dirichlet) conditions; “beq.song” is the non-linear power-series analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.
820 Analysis of Ecological and Environmental Data bReeze Functions for Wind Resource Assessment A collection of functions to analyse, visualize and interpret wind data and to calculate the potential energy production of wind turbines.
821 Analysis of Ecological and Environmental Data CircStats Circular Statistics, from “Topics in Circular Statistics” (2001) Circular Statistics, from “Topics in Circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
822 Analysis of Ecological and Environmental Data circular Circular Statistics Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
823 Analysis of Ecological and Environmental Data cluster (core) “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.
824 Analysis of Ecological and Environmental Data cocorresp Co-Correspondence Analysis Methods Fits predictive and symmetric co-correspondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.
825 Analysis of Ecological and Environmental Data Distance Distance Sampling Detection Function and Abundance Estimation A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a Horvitz-Thompson-like estimator) if survey area information is provided.
826 Analysis of Ecological and Environmental Data diveMove Dive Analysis and Calibration Utilities to represent, visualize, filter, analyse, and summarize time-depth recorder (TDR) data. Miscellaneous functions for handling location data are also provided.
827 Analysis of Ecological and Environmental Data dse Dynamic Systems Estimation (Time Series Package) Tools for multivariate, linear, time-invariant, time series models. This includes ARMA and state-space representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and state-space model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.
828 Analysis of Ecological and Environmental Data DSpat Spatial Modelling for Distance Sampling Data Fits inhomogeneous Poisson process spatial models to line transect sampling data and provides estimate of abundance within a region.
829 Analysis of Ecological and Environmental Data dyn Time Series Regression Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.
830 Analysis of Ecological and Environmental Data dynatopmodel Implementation of the Dynamic TOPMODEL Hydrological Model A native R implementation and enhancement of the Dynamic TOPMODEL semi-distributed hydrological model. Includes some preprocessing, utility and routines for displaying outputs.
831 Analysis of Ecological and Environmental Data dynlm Dynamic Linear Regression Dynamic linear models and time series regression.
832 Analysis of Ecological and Environmental Data e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
833 Analysis of Ecological and Environmental Data earth Multivariate Adaptive Regression Splines Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines” <doi:10.1214/aos/1176347963>. (The term “MARS” is trademarked and thus not used in the name of the package.)
834 Analysis of Ecological and Environmental Data eco Ecological Inference in 2x2 Tables Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the Expectation-Maximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individual-level data can be directly incorporated into the estimation whenever such data are available. Along with in-sample and out-of-sample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models.
835 Analysis of Ecological and Environmental Data ecodist Dissimilarity-Based Functions for Ecological Analysis Dissimilarity-based analysis functions including ordination and Mantel test functions, intended for use with spatial and community data.
836 Analysis of Ecological and Environmental Data EcoHydRology A Community Modeling Foundation for Eco-Hydrology Provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex eco-hydrological interactions.
837 Analysis of Ecological and Environmental Data EnvStats Package for Environmental Statistics, Including US EPA Guidance Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <http://www.springer.com/book/9781461484554>).
838 Analysis of Ecological and Environmental Data equivalence Provides Tests and Graphics for Assessing Tests of Equivalence Provides statistical tests and graphics for assessing tests of equivalence. Such tests have similarity as the alternative hypothesis instead of the null. Sample data sets are included.
839 Analysis of Ecological and Environmental Data evd Functions for Extreme Value Distributions Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.
840 Analysis of Ecological and Environmental Data evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
841 Analysis of Ecological and Environmental Data evir Extreme Values in R Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.
842 Analysis of Ecological and Environmental Data extRemes Extreme Value Analysis Functions for performing extreme value analysis.
843 Analysis of Ecological and Environmental Data fast Implementation of the Fourier Amplitude Sensitivity Test (FAST) The Fourier Amplitude Sensitivity Test (FAST) is a method to determine global sensitivities of a model on parameter changes with relatively few model runs. This package implements this sensitivity analysis method.
844 Analysis of Ecological and Environmental Data FD Measuring functional diversity (FD) from multiple traits, and other tools for functional ecology FD is a package to compute different multidimensional FD indices. It implements a distance-based framework to measure FD that allows any number and type of functional traits, and can also consider species relative abundances. It also contains other useful tools for functional ecology.
845 Analysis of Ecological and Environmental Data flexmix Flexible Mixture Modeling A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
846 Analysis of Ecological and Environmental Data forecast Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
847 Analysis of Ecological and Environmental Data fso Fuzzy Set Ordination Fuzzy set ordination is a multivariate analysis used in ecology to relate the composition of samples to possible explanatory variables. While differing in theory and method, in practice, the use is similar to ‘constrained ordination.’ The package contains plotting and summary functions as well as the analyses.
848 Analysis of Ecological and Environmental Data gam Generalized Additive Models Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).
849 Analysis of Ecological and Environmental Data gamair Data for “GAMs: An Introduction with R” Data sets and scripts used in the book “Generalized Additive Models: An Introduction with R”, Wood (2006) CRC.
850 Analysis of Ecological and Environmental Data hydroGOF Goodness-of-Fit Functions for Comparison of Simulated and Observed Hydrological Time Series S3 functions implementing both statistical and graphical goodness-of-fit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models. Missing values in observed and/or simulated values can be removed before computations. Comments / questions / collaboration of any kind are very welcomed.
851 Analysis of Ecological and Environmental Data HydroMe R codes for estimating water retention and infiltration model parameters using experimental data This package is version 2 of HydroMe v.1 package. It estimates the parameters in infiltration and water retention models by curve-fitting method. The models considered are those that are commonly used in soil science. It has new models for water retention characteristic curve and debugging of errors in HydroMe v.1
852 Analysis of Ecological and Environmental Data hydroPSO Particle Swarm Optimisation, with Focus on Environmental Models State-of-the-art version of the Particle Swarm Optimisation (PSO) algorithm (SPSO-2011 and SPSO-2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of non-smooth and non-linear functions. However, the main focus of hydroPSO is the calibration of environmental and other real-world models that need to be executed from the system console. hydroPSO is model-independent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to fine-tune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with user-friendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallel-capable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian). See Zambrano-Bigiarini and Rojas (2013) <doi:10.1016/j.envsoft.2013.01.004> for more details.
853 Analysis of Ecological and Environmental Data hydroTSM Time Series Management, Analysis and Interpolation for Hydrological Modelling S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.
854 Analysis of Ecological and Environmental Data Interpol.T Hourly interpolation of multiple temperature daily series Hourly interpolation of daily minimum and maximum temperature series. Carries out interpolation on multiple series ad once. Requires some hourly series for calibration (alternatively can use default calibration table).
855 Analysis of Ecological and Environmental Data ipred Improved Predictors Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
856 Analysis of Ecological and Environmental Data ismev An Introduction to Statistical Modeling of Extreme Values Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.
857 Analysis of Ecological and Environmental Data labdsv (core) Ordination and Multivariate Analysis for Ecology A variety of ordination and community analyses useful in analysis of data sets in community ecology. Includes many of the common ordination methods, with graphical routines to facilitate their interpretation, as well as several novel analyses.
858 Analysis of Ecological and Environmental Data latticeDensity Density Estimation and Nonparametric Regression on Irregular Regions Functions that compute the lattice-based density estimator of Barry and McIntyre, which accounts for point processes in two-dimensional regions with irregular boundaries and holes. The package also implements two-dimensional non-parametric regression for similar regions.
859 Analysis of Ecological and Environmental Data lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
860 Analysis of Ecological and Environmental Data maptree Mapping, pruning, and graphing tree models Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.
861 Analysis of Ecological and Environmental Data marked Mark-Recapture Analysis for Survival and Abundance Estimation Functions for fitting various models to capture-recapture data including mixed-effects Cormack-Jolly-Seber(CJS) and multistate models and the multi-variate state model structure for survival estimation and POPAN structured Jolly-Seber models for abundance estimation. There are also Hidden Markov model (HMM) implementations of CJS and multistate models with and without state uncertainty and a simulation capability for HMM models.
862 Analysis of Ecological and Environmental Data MASS (core) Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
863 Analysis of Ecological and Environmental Data mclust Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
864 Analysis of Ecological and Environmental Data mda Mixture and Flexible Discriminant Analysis Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, …
865 Analysis of Ecological and Environmental Data mefa Multivariate Data Handling in Ecology and Biogeography A framework package aimed to provide standardized computational environment for specialist work via object classes to represent the data coded by samples, taxa and segments (i.e. subpopulations, repeated measures). It supports easy processing of the data along with cross tabulation and relational data tables for samples and taxa. An object of class ‘mefa’ is a project specific compendium of the data and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of ‘mefa’ objects. Reports can be generated in plain text or LaTeX format. Vignette contains worked examples.
866 Analysis of Ecological and Environmental Data metacom Analysis of the ‘Elements of Metacommunity Structure’ Functions to analyze coherence, boundary clumping, and turnover following the pattern-based metacommunity analysis of Leibold and Mikkelson 2002 <doi:10.1034/j.1600-0706.2002.970210.x>. The package also includes functions to visualize ecological networks, and to calculate modularity as a replacement to boundary clumping.
867 Analysis of Ecological and Environmental Data mgcv (core) Mixed GAM Computation Vehicle with Automatic Smoothness Estimation Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.
868 Analysis of Ecological and Environmental Data mrds Mark-Recapture Distance Sampling Animal abundance estimation via conventional, multiple covariate and mark-recapture distance sampling (CDS/MCDS/MRDS). Detection function fitting is performed via maximum likelihood. Also included are diagnostics and plotting for fitted detection functions. Abundance estimation is via a Horvitz-Thompson-like estimator.
869 Analysis of Ecological and Environmental Data nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
870 Analysis of Ecological and Environmental Data nsRFA Non-Supervised Regional Frequency Analysis A collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the index-value method and, more precisely, helps the hydrologist to: (1) regionalize the index-value; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.
871 Analysis of Ecological and Environmental Data oce Analysis of Oceanographic Data Supports the analysis of Oceanographic data, including ‘ADCP’ measurements, measurements made with ‘argo’ floats, ‘CTD’ measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the ‘UNESCO’ or ‘TEOS-10’ equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature.
872 Analysis of Ecological and Environmental Data openair Tools for the Analysis of Air Pollution Data Tools to analyse, interpret and understand air pollution data. Data are typically hourly time series and both monitoring data and dispersion model output can be analysed. Many functions can also be applied to other data, including meteorological and traffic data.
873 Analysis of Ecological and Environmental Data ouch Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.
874 Analysis of Ecological and Environmental Data party A Laboratory for Recursive Partytioning A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/1471-2105-8-25>.
875 Analysis of Ecological and Environmental Data pastecs Package for Analysis of Space-Time Ecological Series Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>;) initiative to bring PASSTEC 2000 functionalities to R.
876 Analysis of Ecological and Environmental Data pgirmess Spatial Analysis and Data Mining for Field Ecologists Set of tools for reading, writing and transforming spatial and seasonal data in ecology, model selection and specific statistical tests. It includes functions to discretize polylines into regular point intervals, link observations to those points, compute geographical coordinates at regular intervals between waypoints, read subsets of big rasters, compute zonal statistics or table of categories within polygons or circular buffers from raster. The package also provides miscellaneous functions for model selection, spatial statistics, geometries, writing data.frame with Chinese characters, and some other functions for field ecologists.
877 Analysis of Ecological and Environmental Data popbio Construction and Analysis of Matrix Population Models Construct and analyze projection matrix models from a demography study of marked individuals classified by age or stage. The package covers methods described in Matrix Population Models by Caswell (2001) and Quantitative Conservation Biology by Morris and Doak (2002).
878 Analysis of Ecological and Environmental Data prabclus Functions for Clustering of Presence-Absence, Abundance and Multilocus Genetic Data Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetical data for species delimitation, nearest neighbor based noise detection. Try package?prabclus for on overview.
879 Analysis of Ecological and Environmental Data primer Functions and data for A Primer of Ecology with R Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.
880 Analysis of Ecological and Environmental Data pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
881 Analysis of Ecological and Environmental Data pvclust Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) p-value as well as BP (bootstrap probability) value for each cluster in a dendrogram.
882 Analysis of Ecological and Environmental Data qualV Qualitative Validation Methods Qualitative methods for the validation of dynamic models. It contains (i) an orthogonal set of deviance measures for absolute, relative and ordinal scale and (ii) approaches accounting for time shifts. The first approach transforms time to take time delays and speed differences into account. The second divides the time series into interval units according to their main features and finds the longest common subsequence (LCS) using a dynamic programming algorithm.
883 Analysis of Ecological and Environmental Data quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
884 Analysis of Ecological and Environmental Data quantregGrowth Growth Charts via Regression Quantiles Fits non-crossing regression quantiles as a function of linear covariates and multiple smooth terms via B-splines with L1-norm difference penalties. Monotonicity constraints on the fitted curves are allowed. See Muggeo, Sciandra, Tomasello and Calvo (2013) <doi:10.1007/s10651-012-0232-1> and <doi:10.13140/RG.2.2.12924.85122> for some code example.
885 Analysis of Ecological and Environmental Data randomForest Breiman and Cutler’s Random Forests for Classification and Regression Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <doi:10.1023/A:1010933404324>.
886 Analysis of Ecological and Environmental Data Rcapture Loglinear Models for Capture-Recapture Experiments Estimation of abundance and other of demographic parameters for closed populations, open populations and the robust design in capture-recapture experiments using loglinear models.
887 Analysis of Ecological and Environmental Data RMark R Code for Mark Analysis An interface to the software package MARK that constructs input files for MARK and extracts the output. MARK was developed by Gary White and is freely available at <http://www.phidot.org/software/mark/downloads/> but is not open source.
888 Analysis of Ecological and Environmental Data RMAWGEN Multi-Site Auto-Regressive Weather GENerator S3 and S4 functions are implemented for spatial multi-site stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.
889 Analysis of Ecological and Environmental Data rpart Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
890 Analysis of Ecological and Environmental Data rtop Interpolation of Data with Variable Spatial Support Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.
891 Analysis of Ecological and Environmental Data seacarb Seawater Carbonate Chemistry Calculates parameters of the seawater carbonate system and assists the design of ocean acidification perturbation experiments.
892 Analysis of Ecological and Environmental Data seas Seasonal Analysis and Graphics, Especially for Climatology Capable of deriving seasonal statistics, such as “normals”, and analysis of seasonal data, such as departures. This package also has graphics capabilities for representing seasonal data, including boxplots for seasonal parameters, and bars for summed normals. There are many specific functions related to climatology, including precipitation normals, temperature normals, cumulative precipitation departures and precipitation interarrivals. However, this package is designed to represent any time-varying parameter with a discernible seasonal signal, such as found in hydrology and ecology.
893 Analysis of Ecological and Environmental Data secr Spatially Explicit Capture-Recapture Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distance-dependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.
894 Analysis of Ecological and Environmental Data segmented Regression Models with Break-Points / Change-Points Estimation Given a regression model, segmented ‘updates’ the model by adding one or more segmented (i.e., piece-wise linear) relationships. Several variables with multiple breakpoints are allowed.
895 Analysis of Ecological and Environmental Data sensitivity Global Sensitivity Analysis of Model Outputs A collection of functions for factor screening, global sensitivity analysis and reliability sensitivity analysis. Most of the functions have to be applied on model with scalar output, but several functions support multi-dimensional outputs.
896 Analysis of Ecological and Environmental Data simba A Collection of functions for similarity analysis of vegetation data Besides functions for the calculation of similarity and multiple plot similarity measures with binary data (for instance presence/absence species data) the package contains some simple wrapper functions for reshaping species lists into matrices and vice versa and some other functions for further processing of similarity data (Mantel-like permutation procedures) as well as some other useful stuff for vegetation analysis.
897 Analysis of Ecological and Environmental Data simecol Simulation of Ecological (and Other) Dynamic Systems An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individual-based (or agent-based) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and re-usability of code.
898 Analysis of Ecological and Environmental Data siplab Spatial Individual-Plant Modelling A platform for experimenting with spatially explicit individual-based vegetation models.
899 Analysis of Ecological and Environmental Data soiltexture Functions for Soil Texture Plot, Classification and Transformation “The Soil Texture Wizard” is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, right-angled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().
900 Analysis of Ecological and Environmental Data SPACECAP A Program to Estimate Animal Abundance and Density using Bayesian Spatially-Explicit Capture-Recapture Models SPACECAP is a user-friendly software package for estimating animal densities using closed model capture-recapture sampling based on photographic captures using Bayesian spatially-explicit capture-recapture models. This approach offers advantage such as: substantially dealing with problems posed by individual heterogeneity in capture probabilities in conventional capture-recapture analyses. It also offers non-asymptotic inferences which are more appropriate for small samples of capture data typical of photo-capture studies.
901 Analysis of Ecological and Environmental Data SpatialExtremes Modelling Spatial Extremes Tools for the statistical modelling of spatial extremes using max-stable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric max-stable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple max-stable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the so-called conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) <doi:10.1214/11-STS376>, Padoan et al. (2010) <doi:10.1198/jasa.2009.tm08577>, Dombry et al. (2013) <doi:10.1093/biomet/ass067>.
902 Analysis of Ecological and Environmental Data StreamMetabolism Calculate Single Station Metabolism from Diurnal Oxygen Curves I provide functions to calculate Gross Primary Productivity, Net Ecosystem Production, and Ecosystem Respiration from single station diurnal Oxygen curves.
903 Analysis of Ecological and Environmental Data strucchange Testing, Monitoring, and Dating Structural Changes Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
904 Analysis of Ecological and Environmental Data surveillance Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hohle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hohle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.
905 Analysis of Ecological and Environmental Data tiger TIme series of Grouped ERrors Temporally resolved groups of typical differences (errors) between two time series are determined and visualized
906 Analysis of Ecological and Environmental Data topmodel Implementation of the Hydrological Model TOPMODEL in R Set of hydrological functions including an R implementation of the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode.
907 Analysis of Ecological and Environmental Data tseries Time Series Analysis and Computational Finance Time series analysis and computational finance.
908 Analysis of Ecological and Environmental Data unmarked Models for Data from Unmarked Animals Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates.
909 Analysis of Ecological and Environmental Data untb Ecological Drift under the UNTB Hubbell’s Unified Neutral Theory of Biodiversity.
910 Analysis of Ecological and Environmental Data vegan (core) Community Ecology Package Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
911 Analysis of Ecological and Environmental Data vegetarian Jost Diversity Measures for Community Data This package computes diversity for community data sets using the methods outlined by Jost (2006, 2007). While there are differing opinions on the ideal way to calculate diversity (e.g. Magurran 2004), this method offers the advantage of providing diversity numbers equivalents, independent alpha and beta diversities, and the ability to incorporate ‘order’ (q) as a continuous measure of the importance of rare species in the metrics. The functions provided in this package largely correspond with the equations offered by Jost in the cited papers. The package computes alpha diversities, beta diversities, gamma diversities, and similarity indices. Confidence intervals for diversity measures are calculated using a bootstrap method described by Chao et al. (2008). For datasets with many samples (sites, plots), sim.table creates tables of all pairwise comparisons possible, and for grouped samples sim.groups calculates pairwise combinations of within- and between-group comparisons.
912 Analysis of Ecological and Environmental Data VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
913 Analysis of Ecological and Environmental Data wasim Visualisation and analysis of output files of the hydrological model WASIM Helpful tools for data processing and visualisation of results of the hydrological model WASIM-ETH.
914 Analysis of Ecological and Environmental Data zoo S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
915 Design of Experiments (DoE) & Analysis of Experimental Data acebayes Optimal Bayesian Experimental Design using the ACE Algorithm Optimal Bayesian experimental design using the approximate coordinate exchange (ACE) algorithm.
916 Design of Experiments (DoE) & Analysis of Experimental Data agricolae (core) Statistical Procedures for Agricultural Research Original idea was presented in the thesis “A statistical analysis tool for agricultural research” to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
917 Design of Experiments (DoE) & Analysis of Experimental Data agridat Agricultural Datasets Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.
918 Design of Experiments (DoE) & Analysis of Experimental Data AlgDesign (core) Algorithmic Experimental Design Algorithmic experimental designs. Calculates exact and approximate theory experimental designs for D,A, and I criteria. Very large designs may be created. Experimental designs may be blocked or blocked designs created from a candidate list, using several criteria. The blocking can be done when whole and within plot factors interact.
919 Design of Experiments (DoE) & Analysis of Experimental Data ALTopt Optimal Experimental Designs for Accelerated Life Testing Creates the optimal (D, U and I) designs for the accelerated life testing with right censoring or interval censoring. It uses generalized linear model (GLM) approach to derive the asymptotic variance-covariance matrix of regression coefficients. The failure time distribution is assumed to follow Weibull distribution with a known shape parameter and log-linear link functions are used to model the relationship between failure time parameters and stress variables. The acceleration model may have multiple stress factors, although most ALTs involve only two or less stress factors. ALTopt package also provides several plotting functions including contour plot, Fraction of Use Space (FUS) plot and Variance Dispersion graphs of Use Space (VDUS) plot.
920 Design of Experiments (DoE) & Analysis of Experimental Data asd Simulations for Adaptive Seamless Designs Package runs simulations for adaptive seamless designs with and without early outcomes for treatment selection and subpopulation type designs.
921 Design of Experiments (DoE) & Analysis of Experimental Data BatchExperiments Statistical Experiments on Batch Computing Clusters Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.
922 Design of Experiments (DoE) & Analysis of Experimental Data BayesMAMS Designing Bayesian Multi-Arm Multi-Stage Studies Calculating Bayesian sample sizes for multi-arm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages.
923 Design of Experiments (DoE) & Analysis of Experimental Data bcrm Bayesian Continual Reassessment Method for Phase I Dose-Escalation Trials Implements a wide variety of one and two-parameter Bayesian CRM designs. The program can run interactively, allowing the user to enter outcomes after each cohort has been recruited, or via simulation to assess operating characteristics.
924 Design of Experiments (DoE) & Analysis of Experimental Data BHH2 Useful Functions for Box, Hunter and Hunter II Functions and data sets reproducing some examples in Box, Hunter and Hunter II. Useful for statistical design of experiments, especially factorial experiments.
925 Design of Experiments (DoE) & Analysis of Experimental Data binseqtest Exact Binary Sequential Designs and Analysis For a series of binary responses, create stopping boundary with exact results after stopping, allowing updating for missing assessments.
926 Design of Experiments (DoE) & Analysis of Experimental Data bioOED Sensitivity Analysis and Optimum Experiment Design for Microbial Inactivation Extends the bioinactivation package with functions for Sensitivity Analysis and Optimum Experiment Design.
927 Design of Experiments (DoE) & Analysis of Experimental Data blocksdesign Nested and Crossed Block Designs for Factorial, Fractional Factorial and Unstructured Treatment Sets Constructs D-optimal or near D-optimal nested and crossed block designs for unstructured or general factorial treatment designs. The treatment design, if required, is found from a model matrix design formula and can be added sequentially, if required. The block design is found from a defined set of block factors and is conditional on the defined treatment design. The block factors are added in sequence and each added block factor is optimized conditional on all previously added block factors. The block design can have repeated nesting down to any required depth of nesting with either simple nested blocks or a crossed blocks design at each level of nesting. Outputs include a table showing the allocation of treatments to blocks and tables showing the achieved D-efficiency factors for each block and treatment design.
928 Design of Experiments (DoE) & Analysis of Experimental Data blockTools Block, Assign, and Diagnose Potential Interference in Randomized Experiments Blocks units into experimental blocks, with one unit per treatment condition, by creating a measure of multivariate distance between all possible pairs of units. Maximum, minimum, or an allowable range of differences between units on one variable can be set. Randomly assign units to treatment conditions. Diagnose potential interference between units assigned to different treatment conditions. Write outputs to .tex and .csv files.
929 Design of Experiments (DoE) & Analysis of Experimental Data BOIN Bayesian Optimal INterval (BOIN) Design for Single-Agent and Drug- Combination Phase I Clinical Trials The Bayesian optimal interval (BOIN) design is a novel phase I clinical trial design for finding the maximum tolerated dose (MTD). It can be used to design both single-agent and drug-combination trials. The BOIN design is motivated by the top priority and concern of clinicians when testing a new drug, which is to effectively treat patients and minimize the chance of exposing them to subtherapeutic or overly toxic doses. The prominent advantage of the BOIN design is that it achieves simplicity and superior performance at the same time. The BOIN design is algorithm-based and can be implemented in a simple way similar to the traditional 3+3 design. The BOIN design yields an average performance that is comparable to that of the continual reassessment method (CRM, one of the best model-based designs) in terms of selecting the MTD, but has a substantially lower risk of assigning patients to subtherapeutic or overly toxic doses.
930 Design of Experiments (DoE) & Analysis of Experimental Data BsMD Bayes Screening and Model Discrimination Bayes screening and model discrimination follow-up designs.
931 Design of Experiments (DoE) & Analysis of Experimental Data choiceDes Design Functions for Choice Studies Design functions for DCMs and other types of choice studies (including MaxDiff and other tradeoffs).
932 Design of Experiments (DoE) & Analysis of Experimental Data CombinS Construction Methods of some Series of PBIB Designs Series of partially balanced incomplete block designs (PBIB) based on the combinatory method (S) introduced in (Imane Rezgui et al, 2014) <doi:10.3844/jmssp.2014.45.48>; and it gives their associated U-type design.
933 Design of Experiments (DoE) & Analysis of Experimental Data conf.design (core) Construction of factorial designs This small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs.
934 Design of Experiments (DoE) & Analysis of Experimental Data crmPack Object-Oriented Implementation of CRM Designs Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.
935 Design of Experiments (DoE) & Analysis of Experimental Data crossdes (core) Construction of Crossover Designs Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance.
936 Design of Experiments (DoE) & Analysis of Experimental Data Crossover Analysis and Search of Crossover Designs Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them.
937 Design of Experiments (DoE) & Analysis of Experimental Data dae Functions Useful in the Design and ANOVA of Experiments The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the Design functions for randomizing and assessing designs available in the file ‘daeDesignNotes.pdf’. The ANOVA functions facilitate the extraction of information when the ‘Error’ function has been used in the call to ‘aov’.
938 Design of Experiments (DoE) & Analysis of Experimental Data daewr Design and Analysis of Experiments with R Contains Data frames and functions used in the book “Design and Analysis of Experiments with R”.
939 Design of Experiments (DoE) & Analysis of Experimental Data designGG Computational tool for designing genetical genomics experiments The package provides R scripts for designing genetical genomics experiments.
940 Design of Experiments (DoE) & Analysis of Experimental Data designGLMM Finding Optimal Block Designs for a Generalised Linear Mixed Model Use simulated annealing to find optimal designs for Poisson regression models with blocks.
941 Design of Experiments (DoE) & Analysis of Experimental Data designmatch Matched Samples that are Balanced and Representative by Design Includes functions for the construction of matched samples that are balanced and representative by design. Among others, these functions can be used for matching in observational studies with treated and control units, with cases and controls, in related settings with instrumental variables, and in discontinuity designs. Also, they can be used for the design of randomized experiments, for example, for matching before randomization. By default, ‘designmatch’ uses the ‘GLPK’ optimization solver, but its performance is greatly enhanced by the ‘Gurobi’ optimization solver and its associated R interface. For their installation, please follow the instructions at <http://user.gurobi.com/download/gurobi-optimizer> and <http://www.gurobi.com/documentation/7.0/refman/r_api_overview.html>. We have also included directions in the gurobi_installation file in the inst folder.
942 Design of Experiments (DoE) & Analysis of Experimental Data desirability Function Optimization and Ranking via Desirability Functions S3 classes for multivariate optimization using the desirability function by Derringer and Suich (1980).
943 Design of Experiments (DoE) & Analysis of Experimental Data desplot Plotting Field Plans for Agricultural Experiments A function for plotting maps of agricultural field experiments that are laid out in grids.
944 Design of Experiments (DoE) & Analysis of Experimental Data dfcomb Phase I/II Adaptive Dose-Finding Design for Combination Studies Phase I/II adaptive dose-finding design for combination studies where toxicity rates are supposed to increase with both agents.
945 Design of Experiments (DoE) & Analysis of Experimental Data dfcrm Dose-Finding by the Continual Reassessment Method Provides functions to run the CRM and TITE-CRM in phase I trials and calibration tools for trial planning purposes.
946 Design of Experiments (DoE) & Analysis of Experimental Data dfmta Phase I/II Adaptive Dose-Finding Design for MTA Phase I/II adaptive dose-finding design for single-agent Molecularly Targeted Agent (MTA), according to the paper “Phase I/II Dose-Finding Design for Molecularly Targeted Agent: Plateau Determination using Adaptive Randomization”, Riviere Marie-Karelle et al. (2016) <doi:10.1177/0962280216631763>.
947 Design of Experiments (DoE) & Analysis of Experimental Data dfpk Bayesian Dose-Finding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods, proposed by Ursino et al, (2017) <doi:10.1002/bimj.201600084>, enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).
948 Design of Experiments (DoE) & Analysis of Experimental Data DiceDesign Designs of Computer Experiments Space-Filling Designs and Uniformity Criteria.
949 Design of Experiments (DoE) & Analysis of Experimental Data DiceEval Construction and Evaluation of Metamodels Estimation, validation and prediction of models of different types : linear models, additive models, MARS,PolyMARS and Kriging.
950 Design of Experiments (DoE) & Analysis of Experimental Data DiceKriging Kriging Methods for Computer Experiments Estimation, validation and prediction of kriging models. Important functions : km, print.km, plot.km, predict.km.
951 Design of Experiments (DoE) & Analysis of Experimental Data DiceView Plot Methods for Computer Experiments Design and Surrogate View 2D/3D sections or contours of computer experiments designs, surrogates or test functions.
952 Design of Experiments (DoE) & Analysis of Experimental Data docopulae Optimal Designs for Copula Models A direct approach to optimal designs for copula models based on the Fisher information. Provides flexible functions for building joint PDFs, evaluating the Fisher information and finding optimal designs. It includes an extensible solution to summation and integration called ‘nint’, functions for transforming, plotting and comparing designs, as well as a set of tools for common low-level tasks.
953 Design of Experiments (DoE) & Analysis of Experimental Data DoE.base (core) Full Factorials, Orthogonal Arrays and Base Utilities for DoE Packages Creates full factorial experimental designs and designs based on orthogonal arrays for (industrial) experiments. Provides diverse quality criteria. Provides utility functions for the class design, which is also used by other packages for designed experiments.
954 Design of Experiments (DoE) & Analysis of Experimental Data DoE.MIParray Creation of Arrays by Mixed Integer Programming ‘CRAN’ package ‘DoE.base’ and non-‘CRAN’ packages ‘gurobi’ and ‘Rmosek’ are enhanced with functionality for the creation of optimized arrays for experimentation, where optimization is in terms of generalized minimum aberration. It is also possible to optimally extend existing arrays to larger run size. Optimization requires the availability of at least one of the commercial products ‘Gurobi’ or ‘Mosek’ (free academic licenses available for both). For installing ‘Gurobi’ and its R package ‘gurobi’, follow instructions at <http://www.gurobi.com/downloads/gurobi-optimizer> and <http://www.gurobi.com/documentation/7.5/refman/r_api_overview.html> (or higher version). For installing ‘Mosek’ and its R package ‘Rmosek’, follow instructions at <https://www.mosek.com/downloads/> and <http://docs.mosek.com/8.1/rmosek/install-interface.html>, or use the functionality in the stump CRAN R package ‘Rmosek’.
955 Design of Experiments (DoE) & Analysis of Experimental Data DoE.wrapper (core) Wrapper Package for Design of Experiments Functionality Various kinds of designs for (industrial) experiments can be created. The package uses, and sometimes enhances, design generation routines from other packages. So far, response surface designs from package rsm, latin hypercube samples from packages lhs and DiceDesign, and D-optimal designs from package AlgDesign have been implemented.
956 Design of Experiments (DoE) & Analysis of Experimental Data DoseFinding Planning and Analyzing Dose Finding Experiments The DoseFinding package provides functions for the design and analysis of dose-finding experiments (with focus on pharmaceutical Phase II clinical trials). It provides functions for: multiple contrast tests, fitting non-linear dose-response models (using Bayesian and non-Bayesian estimation), calculating optimal designs and an implementation of the MCPMod methodology.
957 Design of Experiments (DoE) & Analysis of Experimental Data dynaTree Dynamic Trees for Learning and Design Inference by sequential Monte Carlo for dynamic tree regression and classification models with hooks provided for sequential design and optimization, fully online learning with drift, variable selection, and sensitivity analysis of inputs. Illustrative examples from the original dynamic trees paper are facilitated by demos in the package; see demo(package=“dynaTree”).
958 Design of Experiments (DoE) & Analysis of Experimental Data easypower Sample Size Estimation for Experimental Designs Power analysis is used in the estimation of sample sizes for experimental designs. Most programs and R packages will only output the highest recommended sample size to the user. Often the user input can be complicated and computing multiple power analyses for different treatment comparisons can be time consuming. This package simplifies the user input and allows the user to view all of the sample size recommendations or just the ones they want to see. The calculations used to calculate the recommended sample sizes are from the ‘pwr’ package.
959 Design of Experiments (DoE) & Analysis of Experimental Data edesign Maximum Entropy Sampling An implementation of maximum entropy sampling for spatial data is provided. An exact branch-and-bound algorithm as well as greedy and dual greedy heuristics are included.
960 Design of Experiments (DoE) & Analysis of Experimental Data EngrExpt Data sets from “Introductory Statistics for Engineering Experimentation” Datasets from Nelson, Coffin and Copeland “Introductory Statistics for Engineering Experimentation” (Elsevier, 2003) with sample code.
961 Design of Experiments (DoE) & Analysis of Experimental Data experiment R Package for Designing and Analyzing Randomized Experiments Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomized-block and matched-pair designs based on possibly multivariate pre-treatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, two-stage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.
962 Design of Experiments (DoE) & Analysis of Experimental Data ez Easy Analysis and Visualization of Factorial Experiments Facilitates easy analysis of factorial experiments, including purely within-Ss designs (a.k.a. “repeated measures”), purely between-Ss designs, and mixed within-and-between-Ss designs. The functions in this package aim to provide simple, intuitive and consistent specification of data analysis and visualization. Visualization functions also include design visualization for pre-analysis data auditing, and correlation matrix visualization. Finally, this package includes functions for non-parametric analysis, including permutation tests and bootstrap resampling. The bootstrap function obtains predictions either by cell means or by more advanced/powerful mixed effects models, yielding predictions and confidence intervals that may be easily visualized at any level of the experiment’s design.
963 Design of Experiments (DoE) & Analysis of Experimental Data FMC Factorial Experiments with Minimum Level Changes Generate cost effective minimally changed run sequences for symmetrical as well as asymmetrical factorial designs.
964 Design of Experiments (DoE) & Analysis of Experimental Data FrF2 (core) Fractional Factorial Designs with 2-Level Factors Regular and non-regular Fractional Factorial 2-level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2-level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the built-in function alias).
965 Design of Experiments (DoE) & Analysis of Experimental Data FrF2.catlg128 Catalogues of resolution IV 128 run 2-level fractional factorials up to 33 factors that do have 5-letter words This package provides catalogues of resolution IV regular fractional factorial designs in 128 runs for up to 33 2-level factors. The catalogues are complete, excluding resolution IV designs without 5-letter words, because these do not add value for a search for clear designs. The previous package version 1.0 with complete catalogues up to 24 runs (24 runs and a namespace added later) can be downloaded from the authors website.
966 Design of Experiments (DoE) & Analysis of Experimental Data GAD GAD: Analysis of variance from general principles This package analyses complex ANOVA models with any combination of orthogonal/nested and fixed/random factors, as described by Underwood (1997). There are two restrictions: (i) data must be balanced; (ii) fixed nested factors are not allowed. Homogeneity of variances is checked using Cochran’s C test and ‘a posteriori’ comparisons of means are done using Student-Newman-Keuls (SNK) procedure.
967 Design of Experiments (DoE) & Analysis of Experimental Data geospt Geostatistical Analysis and Design of Optimal Spatial Sampling Networks Estimation of the variogram through trimmed mean, radial basis functions (optimization, prediction and cross-validation), summary statistics from cross-validation, pocket plot, and design of optimal sampling networks through sequential and simultaneous points methods.
968 Design of Experiments (DoE) & Analysis of Experimental Data granova Graphical Analysis of Variance This small collection of functions provides what we call elemental graphics for display of anova results. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular anova methods. The two main functions are granova.1w (a graphic for one way anova) and granova.2w (a corresponding graphic for two way anova). These functions were written to display data for any number of groups, regardless of their sizes (however, very large data sets or numbers of groups can be problematic). For these two functions a specialized approach is used to construct data-based contrast vectors for which anova data are displayed. The result is that the graphics use straight lines, and when appropriate flat surfaces, to facilitate clear interpretations while being faithful to the standard effect tests in anova. The graphic results are complementary to standard summary tables for these two basic kinds of analysis of variance; numerical summary results of analyses are also provided as side effects. Two additional functions are granova.ds (for comparing two dependent samples), and granova.contr (which provides graphic displays for a priori contrasts). All functions provide relevant numerical results to supplement the graphic displays of anova data. The graphics based on these functions should be especially helpful for learning how the methods have been applied to answer the question(s) posed. This means they can be particularly helpful for students and non-statistician analysts. But these methods should be quite generally helpful for work-a-day applications of all kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of non-linear transformations of data. In the case of granova.1w and granova.ds especially, several arguments are provided to facilitate flexibility in the construction of graphics that accommodate diverse features of data, according to their corresponding display requirements. See the help files for individual functions.
969 Design of Experiments (DoE) & Analysis of Experimental Data GroupSeq A GUI-Based Program to Compute Probabilities Regarding Group Sequential Designs A graphical user interface to compute group sequential designs based on normally distributed test statistics, particularly critical boundaries, power, drift, and confidence intervals of such designs. All computations are based on the alpha spending approach by Lan-DeMets with various alpha spending functions being available to choose among.
970 Design of Experiments (DoE) & Analysis of Experimental Data gsbDesign Group Sequential Bayes Design Group Sequential Operating Characteristics for Clinical, Bayesian two-arm Trials with known Sigma and Normal Endpoints.
971 Design of Experiments (DoE) & Analysis of Experimental Data gsDesign Group Sequential Design Derives group sequential designs and describes their properties.
972 Design of Experiments (DoE) & Analysis of Experimental Data gset Group Sequential Design in Equivalence Studies calculate equivalence and futility boundaries based on the exact bivariate \(t\) test statistics for group sequential designs in studies with equivalence hypotheses.
973 Design of Experiments (DoE) & Analysis of Experimental Data hiPOD hierarchical Pooled Optimal Design Based on hierarchical modeling, this package provides a few practical functions to find and present the optimal designs for a pooled NGS design.
974 Design of Experiments (DoE) & Analysis of Experimental Data ibd Incomplete Block Designs A collection of several utility functions related to binary incomplete block designs. The package contains function to generate A- and D-efficient binary incomplete block designs with given numbers of treatments, number of blocks and block size. The package also contains function to generate an incomplete block design with specified concurrence matrix. There are functions to generate balanced treatment incomplete block designs and incomplete block designs for test versus control treatments comparisons with specified concurrence matrix. Package also allows performing analysis of variance of data and computing estimated marginal means of factors from experiments using a connected incomplete block design. Tests of hypothesis of treatment contrasts in incomplete block design set up is supported.
975 Design of Experiments (DoE) & Analysis of Experimental Data ICAOD Optimal Designs for Nonlinear Models Finds optimal designs for nonlinear models using a metaheuristic algorithm called imperialist competitive algorithm ICA. See, for details, Masoudi et al. (2017) <doi:10.1016/j.csda.2016.06.014> and Masoudi et al. (2019) <doi:10.1080/10618600.2019.1601097>.
976 Design of Experiments (DoE) & Analysis of Experimental Data idefix Efficient Designs for Discrete Choice Experiments Generates efficient designs for discrete choice experiments based on the multinomial logit model, and individually adapted designs for the mixed multinomial logit model. The generated designs can be presented on screen and choice data can be gathered using a shiny application. Crabbe M, Akinc D and Vandebroek M (2014) <doi:10.1016/j.trb.2013.11.008>.
977 Design of Experiments (DoE) & Analysis of Experimental Data JMdesign Joint Modeling of Longitudinal and Survival Data - Power Calculation Performs power calculations for joint modeling of longitudinal and survival data with k-th order trajectories when the variance-covariance matrix, Sigma_theta, is unknown.
978 Design of Experiments (DoE) & Analysis of Experimental Data LDOD Finding Locally D-optimal optimal designs for some nonlinear and generalized linear models this package provides functions for Finding Locally D-optimal designs for Logistic, Negative Binomial, Poisson, Michaelis-Menten, Exponential, Log-Linear, Emax, Richards, Weibull and Inverse Quadratic regression models and also functions for auto-constructing Fisher information matrix and Frechet derivative based on some input variables and without user-interfere.
979 Design of Experiments (DoE) & Analysis of Experimental Data lhs Latin Hypercube Samples Provides a number of methods for creating and augmenting Latin Hypercube Samples.
980 Design of Experiments (DoE) & Analysis of Experimental Data MAMS Designing Multi-Arm Multi-Stage Studies Designing multi-arm multi-stage studies with (asymptotically) normal endpoints and known variance.
981 Design of Experiments (DoE) & Analysis of Experimental Data MaxPro Maximum Projection Designs Generate maximum projection (MaxPro) designs for quantitative and/or qualitative factors. Details of the MaxPro criterion can be found in: (1) Joseph, Gul, and Ba. (2015) “Maximum Projection Designs for Computer Experiments”, Biometrika, 102, 371-380, and (2) Joseph, Gul, and Ba. (2018) “Designing Computer Experiments with Multiple Types of Factors: The MaxPro Approach”, Journal of Quality Technology, to appear.
982 Design of Experiments (DoE) & Analysis of Experimental Data MBHdesign Spatial Designs for Ecological and Environmental Surveys Provides spatially balanced designs from a set of (contiguous) potential sampling locations in a study region. Accommodates , without detrimental effects on spatial balance, sites that the researcher wishes to include in the survey for reasons other than the current randomisation (legacy sites).
983 Design of Experiments (DoE) & Analysis of Experimental Data minimalRSD Minimally Changed CCD and BBD Generate central composite designs (CCD)with full as well as fractional factorial points (half replicate) and Box Behnken designs (BBD) with minimally changed run sequence.
984 Design of Experiments (DoE) & Analysis of Experimental Data minimaxdesign Minimax and Minimax Projection Designs Provides two main functions: mMcPSO() and miniMaxPro(), which generates minimax designs and minimax projection designs using a hybrid clustering - particle swarm optimization (PSO) algorithm. These designs can be used in a variety of settings, e.g., as space-filling designs for computer experiments or sensor allocation designs. A detailed description of the two designs and the employed algorithms can be found in Mak and Joseph (2017) <doi:10.1080/10618600.2017.1302881>.
985 Design of Experiments (DoE) & Analysis of Experimental Data mixexp Design and Analysis of Mixture Experiments Functions for creating designs for mixture experiments, making ternary contour plots, and making mixture effect plots.
986 Design of Experiments (DoE) & Analysis of Experimental Data mkssd Efficient multi-level k-circulant supersaturated designs mkssd is a package that generates efficient balanced non-aliased multi-level k-circulant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has chisquare efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient multi-level k-circulant design through a progress bar. The progress of 100% means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs.
987 Design of Experiments (DoE) & Analysis of Experimental Data mxkssd Efficient mixed-level k-circulant supersaturated designs mxkssd is a package that generates efficient balanced mixed-level k-circulant supersaturated designs by interchanging the elements of the generator vector. The package tries to generate a supersaturated design that has EfNOD efficiency more than user specified efficiency level (mef). The package also displays the progress of generation of an efficient mixed-level k-circulant design through a progress bar. The progress of 100 per cent means that one full round of interchange is completed. More than one full round (typically 4-5 rounds) of interchange may be required for larger designs.
988 Design of Experiments (DoE) & Analysis of Experimental Data OBsMD Objective Bayesian Model Discrimination in Follow-Up Designs Implements the objective Bayesian methodology proposed in Consonni and Deldossi in order to choose the optimal experiment that better discriminate between competing models. G.Consonni, L. Deldossi (2014) Objective Bayesian Model Discrimination in Follow-up Experimental Designs, Test. <doi:10.1007/s11749-015-0461-3>.
989 Design of Experiments (DoE) & Analysis of Experimental Data odr Optimal Design and Statistical Power of Multilevel Randomized Trials Calculate the optimal sample allocation that minimizes the variance of treatment effect in multilevel randomized trials under fixed budget and cost structure, perform power analyses with and without accommodating costs and budget. The references for proposed methods are: (1) Shen, Z. (in progress). Using optimal sample allocation to improve statistical precision and design efficiency for multilevel randomized trials. (unpublished doctoral dissertation). University of Cincinnati, Cincinnati, OH. (2) Shen, Z., & Kelcey, B. (revise & resubmit). Optimal sample allocation accounts for the full variation of sampling costs in cluster-randomized trials. Journal of Educational and Behavioral Statistics. (3) Shen, Z., & Kelcey, B. (2018, April). Optimal design of cluster randomized trials under condition- and unit-specific cost structures. Roundtable discussion presented at American Educational Research Association (AERA) annual conference. (4) Champely., S. (2018). pwr: Basic functions for power analysis (Version 1.2-2) [Software]. Available from <https://CRAN.R-project.org/package=pwr>.
990 Design of Experiments (DoE) & Analysis of Experimental Data OPDOE Optimal Design of Experiments Several function related to Experimental Design are implemented here, see “Optimal Experimental Design with R” by Rasch D. et. al (ISBN 9781439816974).
991 Design of Experiments (DoE) & Analysis of Experimental Data optbdmaeAT Optimal Block Designs for Two-Colour cDNA Microarray Experiments Computes A-, MV-, D- and E-optimal or near-optimal block designs for two-colour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The algorithms used in this package are based on the treatment exchange and array exchange algorithms of Debusho, Gemechu and Haines (2016, unpublished). The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.
992 Design of Experiments (DoE) & Analysis of Experimental Data optDesignSlopeInt Optimal Designs for Estimating the Slope Divided by the Intercept Compute optimal experimental designs that measure the slope divided by the intercept.
993 Design of Experiments (DoE) & Analysis of Experimental Data OptGS Near-Optimal and Balanced Group-Sequential Designs for Clinical Trials with Continuous Outcomes Functions to find near-optimal multi-stage designs for continuous outcomes.
994 Design of Experiments (DoE) & Analysis of Experimental Data OptimalDesign Algorithms for D-, A-, and IV-Optimal Designs Algorithms for D-, A- and IV-optimal designs of experiments. Some of the functions in this package require the ‘gurobi’ software and its accompanying R package. For their installation, please follow the instructions at <www.gurobi.com> and the file gurobi_inst.txt, respectively.
995 Design of Experiments (DoE) & Analysis of Experimental Data OptimaRegion Confidence Regions for Optima Computes confidence regions on the location of response surface optima.
996 Design of Experiments (DoE) & Analysis of Experimental Data OptInterim Optimal Two and Three Stage Designs for Single-Arm and Two-Arm Randomized Controlled Trials with a Long-Term Binary Endpoint Optimal two and three stage designs monitoring time-to-event endpoints at a specified timepoint
997 Design of Experiments (DoE) & Analysis of Experimental Data optrcdmaeAT Optimal Row-Column Designs for Two-Colour cDNA Microarray Experiments Computes A-, MV-, D- and E-optimal or near-optimal row-column designs for two-colour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all pairwise treatment contrasts. The algorithms used in this package are based on the array exchange and treatment exchange algorithms adopted from Debusho, Gemechu and Haines (2016, unpublished) algorithms after adjusting for the row-column designs setup. The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.
998 Design of Experiments (DoE) & Analysis of Experimental Data osDesign Design and analysis of observational studies The osDesign serves for planning an observational study. Currently, functionality is focused on the two-phase and case-control designs. Functions in this packages provides Monte Carlo based evaluation of operating characteristics such as powers for estimators of the components of a logistic regression model.
999 Design of Experiments (DoE) & Analysis of Experimental Data PBIBD Partially Balanced Incomplete Block Designs The PBIB designs are important type of incomplete block designs having wide area of their applications for example in agricultural experiments, in plant breeding, in sample surveys etc. This package constructs various series of PBIB designs and assists in checking all the necessary conditions of PBIB designs and the association scheme on which these designs are based on. It also assists in calculating the efficiencies of PBIB designs with any number of associate classes. The package also constructs Youden-m square designs which are Row-Column designs for the two-way elimination of heterogeneity. The incomplete columns of these Youden-m square designs constitute PBIB designs. With the present functionality, the package will be of immense importance for the researchers as it will help them to construct PBIB designs, to check if their PBIB designs and association scheme satisfy various necessary conditions for the existence, to calculate the efficiencies of PBIB designs based on any association scheme and to construct Youden-m square designs for the two-way elimination of heterogeneity. R. C. Bose and K. R. Nair (1939) <http://www.jstor.org/stable/40383923>.
1000 Design of Experiments (DoE) & Analysis of Experimental Data PGM2 Nested Resolvable Designs and their Associated Uniform Designs Construction method of nested resolvable designs from a projective geometry defined on Galois field of order 2. The obtained Resolvable designs are used to build uniform design. The presented results are based on <https://eudml.org/doc/219563> and A. Boudraa et al. (See references).
1001 Design of Experiments (DoE) & Analysis of Experimental Data ph2bayes Bayesian Single-Arm Phase II Designs An implementation of Bayesian single-arm phase II design methods for binary outcome based on posterior probability (Thall and Simon (1994) <doi:10.2307/2533377>) and predictive probability (Lee and Liu (2008) <doi:10.1177/1740774508089279>).
1002 Design of Experiments (DoE) & Analysis of Experimental Data ph2bye Phase II Clinical Trial Design Using Bayesian Methods Calculate the Bayesian posterior/predictive probability and determine the sample size and stopping boundaries for single-arm Phase II design.
1003 Design of Experiments (DoE) & Analysis of Experimental Data pid Process Improvement using Data A collection of scripts and data files for the statistics text: “Process Improvement using Data” <https://learnche.org/pid> and the online course “Experimentation for Improvement” found on Coursera. The package contains code for designed experiments, data sets and other convenience functions used in the book.
1004 Design of Experiments (DoE) & Analysis of Experimental Data pipe.design Dual-Agent Dose Escalation for Phase I Trials using the PIPE Design Implements the Product of Independent beta Probabilities dose Escalation (PIPE) design for dual-agent Phase I trials as described in Mander AP, Sweeting MJ (2015) <doi:10.1002/sim.6434>.
1005 Design of Experiments (DoE) & Analysis of Experimental Data planor (core) Generation of Regular Factorial Designs Automatic generation of regular factorial designs, including fractional designs, orthogonal block designs, row-column designs and split-plots. Kobilinsky, Monod and Bailey (2017) <doi:10.1016/j.csda.2016.09.003>.
1006 Design of Experiments (DoE) & Analysis of Experimental Data plgp Particle Learning of Gaussian Processes Sequential Monte Carlo inference for fully Bayesian Gaussian process (GP) regression and classification models by particle learning (PL). The sequential nature of inference and the active learning (AL) hooks provided facilitate thrifty sequential design (by entropy) and optimization (by improvement) for classification and regression models, respectively. This package essentially provides a generic PL interface, and functions (arguments to the interface) which implement the GP models and AL heuristics. Functions for a special, linked, regression/classification GP model and an integrated expected conditional improvement (IECI) statistic is provides for optimization in the presence of unknown constraints. Separable and isotropic Gaussian, and single-index correlation functions are supported. See the examples section of ?plgp and demo(package=“plgp”) for an index of demos
1007 Design of Experiments (DoE) & Analysis of Experimental Data PopED Population (and Individual) Optimal Experimental Design Optimal experimental designs for both population and individual studies based on nonlinear mixed-effect models. Often this is based on a computation of the Fisher Information Matrix. This package was developed for pharmacometric problems, and examples and predefined models are available for these types of systems. The methods are described in Nyberg et al. (2012) <doi:10.1016/j.cmpb.2012.05.005>, and Foracchia et al. (2004) <doi:10.1016/S0169-2607(03)00073-7>.
1008 Design of Experiments (DoE) & Analysis of Experimental Data powerAnalysis Power Analysis in Experimental Design Basic functions for power analysis and effect size calculation.
1009 Design of Experiments (DoE) & Analysis of Experimental Data powerbydesign Power Estimates for ANOVA Designs Functions for bootstrapping the power of ANOVA designs based on estimated means and standard deviations of the conditions. Please refer to the documentation of the boot.power.anova() function for further details.
1010 Design of Experiments (DoE) & Analysis of Experimental Data powerGWASinteraction Power Calculations for GxE and GxG Interactions for GWAS Analytical power calculations for GxE and GxG interactions for case-control studies of candidate genes and genome-wide association studies (GWAS). This includes power calculation for four two-step screening and testing procedures. It can also calculate power for GxE and GxG without any screening.
1011 Design of Experiments (DoE) & Analysis of Experimental Data PwrGSD Power in a Group Sequential Design Tools for the evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans at a set of time varying alternatives. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.
1012 Design of Experiments (DoE) & Analysis of Experimental Data qtlDesign Design of QTL experiments Tools for the design of QTL experiments
1013 Design of Experiments (DoE) & Analysis of Experimental Data qualityTools Statistical Methods for Quality Science Contains methods associated with the Define, Measure, Analyze, Improve and Control (i.e. DMAIC) cycle of the Six Sigma Quality Management methodology.It covers distribution fitting, normal and non-normal process capability indices, techniques for Measurement Systems Analysis especially gage capability indices and Gage Repeatability (i.e Gage RR) and Reproducibility studies, factorial and fractional factorial designs as well as response surface methods including the use of desirability functions. Improvement via Six Sigma is project based strategy that covers 5 phases: Define - Pareto Chart; Measure - Probability and Quantile-Quantile Plots, Process Capability Indices for various distributions and Gage RR Analyze i.e. Pareto Chart, Multi-Vari Chart, Dot Plot; Improve - Full and fractional factorial, response surface and mixture designs as well as the desirability approach for simultaneous optimization of more than one response variable. Normal, Pareto and Lenth Plot of effects as well as Interaction Plots; Control - Quality Control Charts can be found in the ‘qcc’ package. The focus is on teaching the statistical methodology used in the Quality Sciences.
1014 Design of Experiments (DoE) & Analysis of Experimental Data RcmdrPlugin.DoE R Commander Plugin for (industrial) Design of Experiments The package provides a platform-independent GUI for design of experiments. It is implemented as a plugin to the R-Commander, which is a more general graphical user interface for statistics in R based on tcl/tk. DoE functionality can be accessed through the menu Design that is added to the R-Commander menus.
1015 Design of Experiments (DoE) & Analysis of Experimental Data rodd Optimal Discriminating Designs A collection of functions for numerical construction of optimal discriminating designs. At the current moment T-optimal designs (which maximize the lower bound for the power of F-test for regression model discrimination), KL-optimal designs (for lognormal errors) and their robust analogues can be calculated with the package.
1016 Design of Experiments (DoE) & Analysis of Experimental Data RPPairwiseDesign Resolvable partially pairwise balanced design and Space-filling design via association scheme Using some association schemes to obtain a new series of resolvable partially pairwise balanced designs (RPPBD) and space-filling designs.
1017 Design of Experiments (DoE) & Analysis of Experimental Data rsm (core) Response-Surface Analysis Provides functions to generate response-surface designs, fit first- and second-order response-surface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, C-F J and Hamada, M (2009) “Experiments: Planning, Analysis, and Parameter Design Optimization” ISBN 978-0-471-69946-0.
1018 Design of Experiments (DoE) & Analysis of Experimental Data rsurface Design of Rotatable Central Composite Experiments and Response Surface Analysis Produces tables with the level of replication (number of replicates) and the experimental uncoded values of the quantitative factors to be used for rotatable Central Composite Design (CCD) experimentation and a 2-D contour plot of the corresponding variance of the predicted response according to Mead et al. (2012) <doi:10.1017/CBO9781139020879> design_ccd(), and analyzes CCD data with response surface methodology ccd_analysis(). A rotatable CCD provides values of the variance of the predicted response that are concentrically distributed around the average treatment combination used in the experimentation, which with uniform precision (implied by the use of several replicates at the average treatment combination) improves greatly the search and finding of an optimum response. These properties of a rotatable CCD represent undeniable advantages over the classical factorial design, as discussed by Panneton et al. (1999) <doi:10.13031/2013.13267> and Mead et al. (2012) <doi:10.1017/CBO9781139020879.018> among others.
1019 Design of Experiments (DoE) & Analysis of Experimental Data SensoMineR Sensory Data Analysis Statistical Methods to Analyse Sensory Data. SensoMineR: A package for sensory data analysis. S. Le and F. Husson (2008) <doi:10.1111/j.1745-459X.2007.00137.x>.
1020 Design of Experiments (DoE) & Analysis of Experimental Data seqDesign Simulation and Group Sequential Monitoring of Randomized Two-Stage Treatment Efficacy Trials with Time-to-Event Endpoints A modification of the preventive vaccine efficacy trial design of Gilbert, Grove et al. (2011, Statistical Communications in Infectious Diseases) is implemented, with application generally to individual-randomized clinical trials with multiple active treatment groups and a shared control group, and a study endpoint that is a time-to-event endpoint subject to right-censoring. The design accounts for the issues that the efficacy of the treatment/vaccine groups may take time to accrue while the multiple treatment administrations/vaccinations are given; there is interest in assessing the durability of treatment efficacy over time; and group sequential monitoring of each treatment group for potential harm, non-efficacy/efficacy futility, and high efficacy is warranted. The design divides the trial into two stages of time periods, where each treatment is first evaluated for efficacy in the first stage of follow-up, and, if and only if it shows significant treatment efficacy in stage one, it is evaluated for longer-term durability of efficacy in stage two. The package produces plots and tables describing operating characteristics of a specified design including an unconditional power for intention-to-treat and per-protocol/as-treated analyses; trial duration; probabilities of the different possible trial monitoring outcomes (e.g., stopping early for non-efficacy); unconditional power for comparing treatment efficacies; and distributions of numbers of endpoint events occurring after the treatments/vaccinations are given, useful as input parameters for the design of studies of the association of biomarkers with a clinical outcome (surrogate endpoint problem). The code can be used for a single active treatment versus control design and for a single-stage design.
1021 Design of Experiments (DoE) & Analysis of Experimental Data sFFLHD Sequential Full Factorial-Based Latin Hypercube Design Gives design points from a sequential full factorial-based Latin hypercube design, as described in Duan, Ankenman, Sanchez, and Sanchez (2015, Technometrics, <doi:10.1080/00401706.2015.1108233>).
1022 Design of Experiments (DoE) & Analysis of Experimental Data simrel Simulation of Multivariate Linear Model Data Simulate multivariate linear model data is useful in research and education weather for comparison or create data with specific properties. This package lets user to simulate linear model data of wide range of properties with few tuning parameters. The package also consist of function to create plots for the simulation objects and A shiny app as RStudio gadget. It can be a handy tool for model comparison, testing and many other purposes.
1023 Design of Experiments (DoE) & Analysis of Experimental Data skpr (core) Design of Experiments Suite: Generate and Evaluate Optimal Designs Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of split/split-split/…/N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and make analyses more reproducible.
1024 Design of Experiments (DoE) & Analysis of Experimental Data SLHD Maximin-Distance (Sliced) Latin Hypercube Designs Generate the optimal Latin Hypercube Designs (LHDs) for computer experiments with quantitative factors and the optimal Sliced Latin Hypercube Designs (SLHDs) for computer experiments with both quantitative and qualitative factors. Details of the algorithm can be found in Ba, S., Brenneman, W. A. and Myers, W. R. (2015), “Optimal Sliced Latin Hypercube Designs,” Technometrics. Important function in this package is “maximinSLHD”.
1025 Design of Experiments (DoE) & Analysis of Experimental Data soptdmaeA Sequential Optimal Designs for Two-Colour cDNA Microarray Experiments Computes sequential A-, MV-, D- and E-optimal or near-optimal block and row-column designs for two-colour cDNA microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all possible elementary treatment contrasts. The package also provides an optional method of using the graphical user interface (GUI) R package ‘tcltk’ to ensure that it is user friendly.
1026 Design of Experiments (DoE) & Analysis of Experimental Data sp23design Design and Simulation of seamless Phase II-III Clinical Trials Provides methods for generating, exploring and executing seamless Phase II-III designs of Lai, Lavori and Shih using generalized likelihood ratio statistics. Includes pdf and source files that describe the entire R implementation with the relevant mathematical details.
1027 Design of Experiments (DoE) & Analysis of Experimental Data ssize.fdr Sample Size Calculations for Microarray Experiments This package contains a set of functions that calculates appropriate sample sizes for one-sample t-tests, two-sample t-tests, and F-tests for microarray experiments based on desired power while controlling for false discovery rates. For all tests, the standard deviations (variances) among genes can be assumed fixed or random. This is also true for effect sizes among genes in one-sample and two sample experiments. Functions also output a chart of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes.
1028 Design of Experiments (DoE) & Analysis of Experimental Data ssizeRNA Sample Size Calculation for RNA-Seq Experimental Design We propose a procedure for sample size calculation while controlling false discovery rate for RNA-seq experimental design. Our procedure depends on the Voom method proposed for RNA-seq data analysis by Law et al. (2014) <doi:10.1186/gb-2014-15-2-r29> and the sample size calculation method proposed for microarray experiments by Liu and Hwang (2007) <doi:10.1093/bioinformatics/btl664>. We develop a set of functions that calculates appropriate sample sizes for two-sample t-test for RNA-seq experiments with fixed or varied set of parameters. The outputs also contain a plot of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes. To install this package, please use ‘source(“http://bioconductor.org/biocLite.R”); biocLite(“ssizeRNA”)’.
1029 Design of Experiments (DoE) & Analysis of Experimental Data support.CEs Basic Functions for Supporting an Implementation of Choice Experiments Provides seven basic functions that support an implementation of choice experiments.
1030 Design of Experiments (DoE) & Analysis of Experimental Data TEQR Target Equivalence Range Design The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is de-escalated only if an unacceptable level of toxicity is experienced.
1031 Design of Experiments (DoE) & Analysis of Experimental Data tgp Bayesian Treed Gaussian Process Models Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions.
1032 Design of Experiments (DoE) & Analysis of Experimental Data ThreeArmedTrials Design and Analysis of Clinical Non-Inferiority or Superiority Trials with Active and Placebo Control Design and analyze three-arm non-inferiority or superiority trials which follow a gold-standard design, i.e. trials with an experimental treatment, an active, and a placebo control. Method for the following distributions are implemented: Poisson (Mielke and Munk (2009) <arXiv:0912.4169>), negative binomial (Muetze et al. (2016) <doi:10.1002/sim.6738>), normal (Pigeot et al. (2003) <doi:10.1002/sim.1450>; Hasler et al. (2009) <doi:10.1002/sim.3052>), binary (Friede and Kieser (2007) <doi:10.1002/sim.2543>), nonparametric (Muetze et al. (2017) <doi:10.1002/sim.7176>), exponential (Mielke and Munk (2009) <arXiv:0912.4169>).
1033 Design of Experiments (DoE) & Analysis of Experimental Data toxtestD Experimental design for binary toxicity tests Calculates sample size and dose allocation for binary toxicity tests, using the Fish Embryo Toxicity Test as example. An optimal test design is obtained by running (i) spoD (calculate the number of individuals to test under control conditions), (ii) setD (estimate the minimal sample size per treatment given the users precision requirements) and (iii) doseD (construct an individual dose scheme).
1034 Design of Experiments (DoE) & Analysis of Experimental Data unrepx Analysis and Graphics for Unreplicated Experiments Provides half-normal plots, reference plots, and Pareto plots of effects from an unreplicated experiment, along with various pseudo-standard-error measures, simulated reference distributions, and other tools. Many of these methods are described in Daniel C. (1959) <doi:10.1080/00401706.1959.10489866> and/or Lenth R.V. (1989) <doi:10.1080/00401706.1989.10488595>, but some new approaches are added and integrated in one package.
1035 Design of Experiments (DoE) & Analysis of Experimental Data vdg Variance Dispersion Graphs and Fraction of Design Space Plots Facilities for constructing variance dispersion graphs, fraction- of-design-space plots and similar graphics for exploring the properties of experimental designs. The design region is explored via random sampling, which allows for more flexibility than traditional variance dispersion graphs. A formula interface is leveraged to provide access to complex model formulae. Graphics can be constructed simultaneously for multiple experimental designs and/or multiple model formulae. Instead of using pointwise optimization to find the minimum and maximum scaled prediction variance curves, which can be inaccurate and time consuming, this package uses quantile regression as an alternative.
1036 Design of Experiments (DoE) & Analysis of Experimental Data Vdgraph Variance dispersion graphs and Fraction of design space plots for response surface designs Uses a modification of the published FORTRAN code in “A Computer Program for Generating Variance Dispersion Graphs” by G. Vining, Journal of Quality Technology, Vol. 25 No. 1 January 1993, to produce variance dispersion graphs. Also produces fraction of design space plots, and contains data frames for several minimal run response surface designs.
1037 Design of Experiments (DoE) & Analysis of Experimental Data VdgRsm Plots of Scaled Prediction Variances for Response Surface Designs Functions for creating variance dispersion graphs, fraction of design space plots, and contour plots of scaled prediction variances for second-order response surface designs in spherical and cuboidal regions. Also, some standard response surface designs can be generated.
1038 Design of Experiments (DoE) & Analysis of Experimental Data VNM Finding Multiple-Objective Optimal Designs for the 4-Parameter Logistic Model Provide tools for finding multiple-objective optimal designs for estimating the shape of dose-response, the ED50 (the dose producing an effect midway between the expected responses at the extreme doses) and the MED (the minimum effective dose level) for the 2,3,4-parameter logistic models and for evaluating its efficiencies for the three objectives. The acronym VNM stands for V-algorithm using Newton Raphson method to search multiple-objective optimal design.
1039 Extreme Value Analysis copula Multivariate Dependence with Copulas Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
1040 Extreme Value Analysis evd (core) Functions for Extreme Value Distributions Extends simulation, distribution, quantile and density functions to univariate and multivariate parametric extreme value distributions, and provides fitting functions which calculate maximum likelihood estimates for univariate and bivariate maxima models, and for univariate and bivariate threshold models.
1041 Extreme Value Analysis evdbayes Bayesian Analysis in Extreme Value Theory Provides functions for the bayesian analysis of extreme value models, using MCMC methods.
1042 Extreme Value Analysis evir (core) Extreme Values in R Functions for extreme value theory, which may be divided into the following groups; exploratory data analysis, block maxima, peaks over thresholds (univariate and bivariate), point processes, gev/gpd distributions.
1043 Extreme Value Analysis evmix Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation The usual distribution functions, maximum likelihood inference and model diagnostics for univariate stationary extreme value mixture models are provided. Kernel density estimation including various boundary corrected kernel density estimation methods and a wide choice of kernels, with cross-validation likelihood based bandwidth estimator. Reasonable consistency with the base functions in the ‘evd’ package is provided, so that users can safely interchange most code.
1044 Extreme Value Analysis extremefit Estimation of Extreme Conditional Quantiles and Probabilities Extreme value theory, nonparametric kernel estimation, tail conditional probabilities, extreme conditional quantile, adaptive estimation, quantile regression, survival probabilities.
1045 Extreme Value Analysis extRemes Extreme Value Analysis Functions for performing extreme value analysis.
1046 Extreme Value Analysis extremeStat Extreme Value Statistics and Quantile Estimation Code to fit, plot and compare several (extreme value) distribution functions. Can also compute (truncated) distribution quantile estimates and draw a plot with return periods on a linear scale.
1047 Extreme Value Analysis fExtremes Rmetrics - Modelling Extreme Events in Finance Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data pre-processing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.
1048 Extreme Value Analysis ismev An Introduction to Statistical Modeling of Extreme Values Functions to support the computations carried out in ‘An Introduction to Statistical Modeling of Extreme Values’ by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes.
1049 Extreme Value Analysis lmom L-Moments Functions related to L-moments: computation of L-moments and trimmed L-moments of distributions and data samples; parameter estimation; L-moment ratio diagram; plot vs. quantiles of an extreme-value distribution.
1050 Extreme Value Analysis lmomco L-Moments, Censored L-Moments, Trimmed L-Moments, L-Comoments, and Many Distributions Extensive functions for L-moments (LMs) and probability-weighted moments (PWMs), parameter estimation for distributions, LM computation for distributions, and L-moment ratio diagrams. Maximum likelihood and maximum product of spacings estimation are also available. LMs for right-tail and left-tail censoring by known or unknown threshold and by indicator variable are available. Asymmetric (asy) trimmed LMs (TL-moments, TLMs) are supported. LMs of residual (resid) and reversed (rev) resid life are implemented along with 13 quantile function operators for reliability and survival analyses. Exact analytical bootstrap estimates of order statistics, LMs, and variances- covariances of LMs are provided. The Harri-Coble Tau34-squared Normality Test is available. Distribution support with “L” (LMs), “TL” (TLMs) and added (+) support for right-tail censoring (RC) encompasses: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], Eta-Mu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L,TL], Gen Logistic [L), Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], Kappa-Mu [L], Kumaraswamy [L], Laplace [L], Linear Mean Resid. Quantile Function [L], Normal [L], 3-p log-Normal [L], Pearson Type III [L], Rayleigh [L], Rev-Gumbel [L+RC], Rice/Rician [L], Slash [TL], 3-p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L]. Multivariate sample L-comoments (LCMs) are implemented to measure asymmetric associations.
1051 Extreme Value Analysis lmomRFA Regional Frequency Analysis using L-Moments Functions for regional frequency analysis using the methods of J. R. M. Hosking and J. R. Wallis (1997), “Regional frequency analysis: an approach based on L-moments”.
1052 Extreme Value Analysis mev Multivariate Extreme Value Distributions Exact simulation from max-stable processes and multivariate extreme value distributions for various parametric models. Threshold selection methods.
1053 Extreme Value Analysis POT Generalized Pareto Distribution and Peaks Over Threshold Some functions useful to perform a Peak Over Threshold analysis in univariate and bivariate cases, see Beirlant et al. (2004) <doi:10.1002/0470012382>. A user’s guide is available.
1054 Extreme Value Analysis QRM Provides R-Language Code to Examine Quantitative Risk Management Concepts Accompanying package to the book Quantitative Risk Management: Concepts, Techniques and Tools by Alexander J. McNeil, Rudiger Frey, and Paul Embrechts.
1055 Extreme Value Analysis ReIns Functions from “Reinsurance: Actuarial and Statistical Aspects” Functions from the book “Reinsurance: Actuarial and Statistical Aspects” (2017) by Hansjoerg Albrecher, Jan Beirlant and Jef Teugels <http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470772689.html>.
1056 Extreme Value Analysis Renext Renewal Method for Extreme Values Extrapolation Peaks Over Threshold (POT) or ‘methode du renouvellement’. The distribution for the exceedances can be chosen, and heterogeneous data (including historical data or block data) can be used in a Maximum-Likelihood framework.
1057 Extreme Value Analysis revdbayes Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://cran.r-project.org/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://cran.r-project.org/package=evdbayes>, which uses Markov Chain Monte Carlo (‘MCMC’) methods for posterior simulation. Also provided are functions for making inferences about the extremal index, using the K-gaps model of Suveges and Davison (2010) <doi:10.1214/09-AOAS292>. Also provided are d,p,q,r functions for the Generalised Extreme Value (‘GEV’) and Generalised Pareto (‘GP’) distributions that deal appropriately with cases where the shape parameter is very close to zero.
1058 Extreme Value Analysis RTDE Robust Tail Dependence Estimation Robust tail dependence estimation for bivariate models. This package is based on two papers by the authors:‘Robust and bias-corrected estimation of the coefficient of tail dependence’ and ‘Robust and bias-corrected estimation of probabilities of extreme failure sets’. This work was supported by a research grant (VKR023480) from VILLUM FONDEN and an international project for scientific cooperation (PICS-6416).
1059 Extreme Value Analysis SpatialExtremes Modelling Spatial Extremes Tools for the statistical modelling of spatial extremes using max-stable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric max-stable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple max-stable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the so-called conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) <doi:10.1214/11-STS376>, Padoan et al. (2010) <doi:10.1198/jasa.2009.tm08577>, Dombry et al. (2013) <doi:10.1093/biomet/ass067>.
1060 Extreme Value Analysis texmex Statistical Modelling of Extreme Values Statistical extreme value modelling of threshold excesses, maxima and multivariate extremes. Univariate models for threshold excesses and maxima are the Generalised Pareto, and Generalised Extreme Value model respectively. These models may be fitted by using maximum (optionally penalised-)likelihood, or Bayesian estimation, and both classes of models may be fitted with covariates in any/all model parameters. Model diagnostics support the fitting process. Graphical output for visualising fitted models and return level estimates is provided. For serially dependent sequences, the intervals declustering algorithm of Ferro and Segers (2003) <doi:10.1111/1467-9868.00401> is provided, with diagnostic support to aid selection of threshold and declustering horizon. Multivariate modelling is performed via the conditional approach of Heffernan and Tawn (2004) <doi:10.1111/j.1467-9868.2004.02050.x>, with graphical tools for threshold selection and to diagnose estimation convergence.
1061 Extreme Value Analysis threshr Threshold Selection and Uncertainty for Extreme Value Analysis Provides functions for the selection of thresholds for use in extreme value models, based mainly on the methodology in Northrop, Attalides and Jonathan (2017) <doi:10.1111/rssc.12159>. It also performs predictive inferences about future extreme values, based either on a single threshold or on a weighted average of inferences from multiple thresholds, using the ‘revdbayes’ package <https://cran.r-project.org/package=revdbayes>. At the moment only the case where the data can be treated as independent identically distributed observations is considered.
1062 Extreme Value Analysis VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
1063 Empirical Finance actuar Actuarial Functions and Heavy Tailed Distributions Functions and data sets for actuarial science: modeling of loss distributions; risk theory and ruin theory; simulation of compound models, discrete mixtures and compound hierarchical models; credibility theory. Support for many additional probability distributions to model insurance loss amounts and loss frequency: 19 continuous heavy tailed distributions; the Poisson-inverse Gaussian discrete distribution; zero-truncated and zero-modified extensions of the standard discrete distributions. Support for phase-type distributions commonly used to compute ruin probabilities.
1064 Empirical Finance AmericanCallOpt This package includes pricing function for selected American call options with underlying assets that generate payouts This package includes a set of pricing functions for American call options. The following cases are covered: Pricing of an American call using the standard binomial approximation; Hedge parameters for an American call with a standard binomial tree; Binomial pricing of an American call with continuous payout from the underlying asset; Binomial pricing of an American call with an underlying stock that pays proportional dividends in discrete time; Pricing of an American call on futures using a binomial approximation; Pricing of a currency futures American call using a binomial approximation; Pricing of a perpetual American call. The user should kindly notice that this material is for educational purposes only. The codes are not optimized for computational efficiency as they are meant to represent standard cases of analytical and numerical solution.
1065 Empirical Finance backtest Exploring Portfolio-Based Conjectures About Financial Instruments The backtest package provides facilities for exploring portfolio-based conjectures about financial instruments (stocks, bonds, swaps, options, et cetera).
1066 Empirical Finance bayesGARCH Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations Provides the bayesGARCH() function which performs the Bayesian estimation of the GARCH(1,1) model with Student’s t innovations as described in Ardia (2008) <doi:10.1007/978-3-540-78657-3>.
1067 Empirical Finance BCC1997 Calculation of Option Prices Based on a Universal Solution Calculates the prices of European options based on the universal solution provided by Bakshi, Cao and Chen (1997) <doi:10.1111/j.1540-6261.1997.tb02749.x>. This solution considers stochastic volatility, stochastic interest and random jumps. Please cite their work if this package is used.
1068 Empirical Finance BenfordTests Statistical Tests for Evaluating Conformity to Benford’s Law Several specialized statistical tests and support functions for determining if numerical data could conform to Benford’s law.
1069 Empirical Finance betategarch Simulation, Estimation and Forecasting of Beta-Skew-t-EGARCH Models Simulation, estimation and forecasting of first-order Beta-Skew-t-EGARCH models with leverage (one-component, two-component, skewed versions).
1070 Empirical Finance bizdays Business Days Calculations and Utilities Business days calculations based on a list of holidays and nonworking weekdays. Quite useful for fixed income and derivatives pricing.
1071 Empirical Finance BLModel Black-Litterman Posterior Distribution Posterior distribution in the Black-Litterman model is computed from a prior distribution given in the form of a time series of asset returns and a continuous distribution of views provided by the user as an external function.
1072 Empirical Finance BurStFin Burns Statistics Financial A suite of functions for finance, including the estimation of variance matrices via a statistical factor model or Ledoit-Wolf shrinkage.
1073 Empirical Finance BurStMisc Burns Statistics Miscellaneous Script search, corner, genetic optimization, permutation tests, write expect test.
1074 Empirical Finance CADFtest A Package to Perform Covariate Augmented Dickey-Fuller Unit Root Tests Hansen’s (1995) Covariate-Augmented Dickey-Fuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The p-values of the test are computed using the procedure illustrated in Lupi (2009).
1075 Empirical Finance car Companion to Applied Regression Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.
1076 Empirical Finance ChainLadder Statistical Methods and Models for Claims Reserving in General Insurance Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.
1077 Empirical Finance copula Multivariate Dependence with Copulas Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
1078 Empirical Finance CreditMetrics Functions for calculating the CreditMetrics risk model A set of functions for computing the CreditMetrics risk model
1079 Empirical Finance credule Credit Default Swap Functions It provides functions to bootstrap Credit Curves from market quotes (Credit Default Swap - CDS - spreads) and price Credit Default Swaps - CDS.
1080 Empirical Finance crp.CSFP CreditRisk+ Portfolio Model Modelling credit risks based on the concept of “CreditRisk+”, First Boston Financial Products, 1997 and “CreditRisk+ in the Banking Industry”, Gundlach & Lehrbass, Springer, 2003.
1081 Empirical Finance crseEventStudy A Robust and Powerful Test of Abnormal Stock Returns in Long-Horizon Event Studies Based on Dutta et al. (2018) <doi:10.1016/j.jempfin.2018.02.004>, this package provides their standardized test for abnormal returns in long-horizon event studies. The methods used improve the major weaknesses of size, power, and robustness of long-run statistical tests described in Kothari/Warner (2007) <doi:10.1016/B978-0-444-53265-7.50015-9>. Abnormal returns are weighted by their statistical precision (i.e., standard deviation), resulting in abnormal standardized returns. This procedure efficiently captures the heteroskedasticity problem. Clustering techniques following Cameron et al. (2011) <10.1198/jbes.2010.07136> are adopted for computing cross-sectional correlation robust standard errors. The statistical tests in this package therefore accounts for potential biases arising from returns’ cross-sectional correlation, autocorrelation, and volatility clustering without power loss.
1082 Empirical Finance cvar Compute Expected Shortfall and Value at Risk for Continuous Distributions Compute expected shortfall (ES) and Value at Risk (VaR) from a quantile function, distribution function, random number generator or probability density function. ES is also known as Conditional Value at Risk (CVaR). Virtually any continuous distribution can be specified. The functions are vectorized over the arguments. The computations are done directly from the definitions, see e.g. Acerbi and Tasche (2002) <doi:10.1111/1468-0300.00091>. Some support for GARCH models is provided, as well.
1083 Empirical Finance data.table Extension of ‘data.frame’ Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
1084 Empirical Finance derivmkts Functions and R Code to Accompany Derivatives Markets A set of pricing and expository functions that should be useful in teaching a course on financial derivatives.
1085 Empirical Finance dlm Bayesian and Likelihood Analysis of Dynamic Linear Models Provides routines for Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models.
1086 Empirical Finance Dowd Functions Ported from ‘MMR2’ Toolbox Offered in Kevin Dowd’s Book Measuring Market Risk ‘Kevin Dowd’s’ book Measuring Market Risk is a widely read book in the area of risk measurement by students and practitioners alike. As he claims, ‘MATLAB’ indeed might have been the most suitable language when he originally wrote the functions, but, with growing popularity of R it is not entirely valid. As ‘Dowd’s’ code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them. ‘Dowd’s’ original code can be downloaded from www.kevindowd.org/measuring-market-risk/. It should be noted that ‘Dowd’ offers both ‘MMR2’ and ‘MMR1’ toolboxes. Only ‘MMR2’ was ported to R. ‘MMR2’ is more recent version of ‘MMR1’ toolbox and they both have mostly similar function. The toolbox mainly contains different parametric and non parametric methods for measurement of market risk as well as backtesting risk measurement methods.
1087 Empirical Finance DriftBurstHypothesis Calculates the Test-Statistic for the Drift Burst Hypothesis Calculates the T-Statistic for the drift burst hypothesis from the working paper Christensen, Oomen and Reno (2018) <doi:10.2139/ssrn.2842535>. The authors’ MATLAB code is available upon request, see: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2842535>.
1088 Empirical Finance dse Dynamic Systems Estimation (Time Series Package) Tools for multivariate, linear, time-invariant, time series models. This includes ARMA and state-space representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and state-space model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.
1089 Empirical Finance DtD Distance to Default Provides fast methods to work with Merton’s distance to default model introduced in Merton (1974) <doi:10.1111/j.1540-6261.1974.tb03058.x>. The methods includes simulation and estimation of the parameters.
1090 Empirical Finance dyn Time Series Regression Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.
1091 Empirical Finance dynlm Dynamic Linear Regression Dynamic linear models and time series regression.
1092 Empirical Finance ESG ESG - A package for asset projection The package presents a “Scenarios” class containing general parameters, risk parameters and projection results. Risk parameters are gathered together into a ParamsScenarios sub-object. The general process for using this package is to set all needed parameters in a Scenarios object, use the customPathsGeneration method to proceed to the projection, then use xxx_PriceDistribution() methods to get asset prices.
1093 Empirical Finance estudy2 An Implementation of Parametric and Nonparametric Event Study An implementation of a most commonly used event study methodology, including both parametric and nonparametric tests. It contains variety aspects of the rate of return estimation (the core calculation is done in C++), as well as three classical for event study market models: mean adjusted returns, market adjusted returns and single-index market models. There are 6 parametric and 6 nonparametric tests provided, which examine cross-sectional daily abnormal return (see the documentation of the functions for more information). Parametric tests include tests proposed by Brown and Warner (1980) <doi:10.1016/0304-405X(80)90002-1>, Brown and Warner (1985) <doi:10.1016/0304-405X(85)90042-X>, Boehmer et al. (1991) <doi:10.1016/0304-405X(91)90032-F>, Patell (1976) <doi:10.2307/2490543>, and Lamb (1995) <doi:10.2307/253695>. Nonparametric tests covered in estudy2 are tests described in Corrado and Zivney (1992) <doi:10.2307/2331331>, McConnell and Muscarella (1985) <doi:10.1016/0304-405X(85)90006-6>, Boehmer et al. (1991) <doi:10.1016/0304-405X(91)90032-F>, Cowan (1992) <doi:10.1007/BF00939016>, Corrado (1989) <doi:10.1016/0304-405X(89)90064-0>, Campbell and Wasley (1993) <doi:10.1016/0304-405X(93)90025-7>, Savickas (2003) <doi:10.1111/1475-6803.00052>, Kolari and Pynnonen (2010) <doi:10.1093/rfs/hhq072>. Furthermore, tests for the cumulative abnormal returns proposed by Brown and Warner (1985) <doi:10.1016/0304-405X(85)90042-X> and Lamb (1995) <doi:10.2307/253695> are included.
1094 Empirical Finance factorstochvol Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models with interweaving <doi:10.1080/10618600.2017.1322091>. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix <doi:10.1016/j.jeconom.2018.11.007>.
1095 Empirical Finance fame Interface for FAME Time Series Database Read and write FAME databases.
1096 Empirical Finance fAssets (core) Rmetrics - Analysing and Modelling Financial Assets Provides a collection of functions to manage, to investigate and to analyze data sets of financial assets from different points of view.
1097 Empirical Finance FatTailsR Kiener Distributions and Fat Tails in Finance Kiener distributions K1, K2, K3, K4 and K7 to characterize distributions with left and right, symmetric or asymmetric fat tails in market finance, neuroscience and other disciplines. Two algorithms to estimate with a high accuracy distribution parameters, quantiles, value-at-risk and expected shortfall. Include power hyperbolas and power hyperbolic functions.
1098 Empirical Finance fBasics (core) Rmetrics - Markets and Basic Statistics Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.
1099 Empirical Finance fBonds (core) Rmetrics - Pricing and Evaluating Bonds It implements the Nelson-Siegel and the Nelson-Siegel-Svensson term structures.
1100 Empirical Finance fCopulae (core) Rmetrics - Bivariate Dependence Structures with Copulae Provides a collection of functions to manage, to investigate and to analyze bivariate financial returns by Copulae. Included are the families of Archemedean, Elliptical, Extreme Value, and Empirical Copulae.
1101 Empirical Finance fExoticOptions (core) Rmetrics - Pricing and Evaluating Exotic Option Provides a collection of functions to evaluate barrier options, Asian options, binary options, currency translated options, lookback options, multiple asset options and multiple exercise options.
1102 Empirical Finance fExtremes (core) Rmetrics - Modelling Extreme Events in Finance Provides functions for analysing and modelling extreme events in financial time Series. The topics include: (i) data pre-processing, (ii) explorative data analysis, (iii) peak over threshold modelling, (iv) block maxima modelling, (v) estimation of VaR and CVaR, and (vi) the computation of the extreme index.
1103 Empirical Finance fgac Generalized Archimedean Copula Bi-variate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).
1104 Empirical Finance fGarch (core) Rmetrics - Autoregressive Conditional Heteroskedastic Modelling Provides a collection of functions to analyze and model heteroskedastic behavior in financial time series models.
1105 Empirical Finance fImport (core) Rmetrics - Importing Economic and Financial Data Provides a collection of utility functions to download and manage data sets from the Internet or from other sources.
1106 Empirical Finance FinancialMath Financial Mathematics for Actuaries Contains financial math functions and introductory derivative functions included in the Society of Actuaries and Casualty Actuarial Society ‘Financial Mathematics’ exam, and some topics in the ‘Models for Financial Economics’ exam.
1107 Empirical Finance FinAsym Classifies implicit trading activity from market quotes and computes the probability of informed trading This package accomplishes two tasks: a) it classifies implicit trading activity from quotes in OTC markets using the algorithm of Lee and Ready (1991); b) based on information for trade initiation, the package computes the probability of informed trading of Easley and O’Hara (1987).
1108 Empirical Finance finreportr Financial Data from U.S. Securities and Exchange Commission Download and display company financial data from the U.S. Securities and Exchange Commission’s EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <https://www.sec.gov/edgar/searchedgar/companysearch.html> for more information.
1109 Empirical Finance fmdates Financial Market Date Calculations Implements common date calculations relevant for specifying the economic nature of financial market contracts that are typically defined by International Swap Dealer Association (ISDA, <http://www2.isda.org>) legal documentation. This includes methods to check whether dates are business days in certain locales, functions to adjust and shift dates and time length (or day counter) calculations.
1110 Empirical Finance fMultivar (core) Rmetrics - Analysing and Modeling Multivariate Financial Return Distributions Provides a collection of functions to manage, to investigate and to analyze bivariate and multivariate data sets of financial returns.
1111 Empirical Finance fNonlinear (core) Rmetrics - Nonlinear and Chaotic Time Series Modelling Provides a collection of functions for testing various aspects of univariate time series including independence and neglected nonlinearities. Further provides functions to investigate the chaotic behavior of time series processes and to simulate different types of chaotic time series maps.
1112 Empirical Finance fOptions (core) Rmetrics - Pricing and Evaluating Basic Options Provides a collection of functions to valuate basic options. This includes the generalized Black-Scholes option, options on futures and options on commodity futures.
1113 Empirical Finance forecast Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
1114 Empirical Finance fPortfolio (core) Rmetrics - Portfolio Selection and Optimization Provides a collection of functions to optimize portfolios and to analyze them from different points of view.
1115 Empirical Finance fracdiff Fractionally differenced ARIMA aka ARFIMA(p,d,q) models Maximum likelihood estimation of the parameters of a fractionally differenced ARIMA(p,d,q) model (Haslett and Raftery, Appl.Statistics, 1989).
1116 Empirical Finance fractal A Fractal Time Series Modeling and Analysis Package Stochastic fractal and deterministic chaotic time series analysis.
1117 Empirical Finance FRAPO Financial Risk Modelling and Portfolio Optimisation with R Accompanying package of the book ‘Financial Risk Modelling and Portfolio Optimisation with R’, second edition. The data sets used in the book are contained in this package.
1118 Empirical Finance fRegression (core) Rmetrics - Regression Based Decision and Prediction A collection of functions for linear and non-linear regression modelling. It implements a wrapper for several regression models available in the base and contributed packages of R.
1119 Empirical Finance frmqa The Generalized Hyperbolic Distribution, Related Distributions and Their Applications in Finance A collection of R and C++ functions to work with the generalized hyperbolic distribution, related distributions and their applications in financial risk management and quantitative analysis.
1120 Empirical Finance fTrading (core) Rmetrics - Trading and Rebalancing Financial Instruments A collection of functions for trading and rebalancing financial instruments. It implements various technical indicators to analyse time series such as moving averages or stochastic oscillators.
1121 Empirical Finance GCPM Generalized Credit Portfolio Model Analyze the default risk of credit portfolios. Commonly known models, like CreditRisk+ or the CreditMetrics model are implemented in their very basic settings. The portfolio loss distribution can be achieved either by simulation or analytically in case of the classic CreditRisk+ model. Models are only implemented to respect losses caused by defaults, i.e. migration risk is not included. The package structure is kept flexible especially with respect to distributional assumptions in order to quantify the sensitivity of risk figures with respect to several assumptions. Therefore the package can be used to determine the credit risk of a given portfolio as well as to quantify model sensitivities.
1122 Empirical Finance GetHFData Download and Aggregate High Frequency Trading Data from Bovespa Downloads and aggregates high frequency trading data for Brazilian instruments directly from Bovespa ftp site <ftp://ftp.bmf.com.br/MarketData/>.
1123 Empirical Finance gets General-to-Specific (GETS) Modelling and Indicator Saturation Methods Automated General-to-Specific (GETS) modelling of the mean and variance of a regression, and indicator saturation methods for detecting and testing for structural breaks in the mean.
1124 Empirical Finance GetTDData Get Data for Brazilian Bonds (Tesouro Direto) Downloads and aggregates data for Brazilian government issued bonds directly from the website of Tesouro Direto <http://www.tesouro.fazenda.gov.br/tesouro-direto-balanco-e-estatisticas>.
1125 Empirical Finance GEVStableGarch ARMA-GARCH/APARCH Models with GEV and Stable Distributions Package for simulation and estimation of ARMA-GARCH/APARCH models with GEV and stable distributions.
1126 Empirical Finance ghyp A Package on Generalized Hyperbolic Distribution and Its Special Cases Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Student-t and Gaussian distribution). Especially, it contains fitting procedures, an AIC-based model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution.
1127 Empirical Finance gmm Generalized Method of Moments and Generalized Empirical Likelihood It is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; <doi:10.2307/1392442>) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>).
1128 Empirical Finance gogarch Generalized Orthogonal GARCH (GO-GARCH) models Implementation of the GO-GARCH model class
1129 Empirical Finance GUIDE GUI for DErivatives in R A nice GUI for financial DErivatives in R.
1130 Empirical Finance highfrequency Tools for Highfrequency Data Analysis Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity.
1131 Empirical Finance IBrokers R API to Interactive Brokers Trader Workstation Provides native R access to Interactive Brokers Trader Workstation API.
1132 Empirical Finance InfoTrad Calculates the Probability of Informed Trading (PIN) Estimates the probability of informed trading (PIN) initially introduced by Easley et. al. (1996) <doi:10.1111/j.1540-6261.1996.tb04074.x> . Contribution of the package is that it uses likelihood factorizations of Easley et. al. (2010) <doi:10.1017/S0022109010000074> (EHO factorization) and Lin and Ke (2011) <doi:10.1016/j.finmar.2011.03.001> (LK factorization). Moreover, the package uses different estimation algorithms. Specifically, the grid-search algorithm proposed by Yan and Zhang (2012) <doi:10.1016/j.jbankfin.2011.08.003> , hierarchical agglomerative clustering approach proposed by Gan et. al. (2015) <doi:10.1080/14697688.2015.1023336> and later extended by Ersan and Alici (2016) <doi:10.1016/j.intfin.2016.04.001> .
1133 Empirical Finance lgarch Simulation and Estimation of Log-GARCH Models Simulation and estimation of univariate and multivariate log-GARCH models. The main functions of the package are: lgarchSim(), mlgarchSim(), lgarch() and mlgarch(). The first two functions simulate from a univariate and a multivariate log-GARCH model, respectively, whereas the latter two estimate a univariate and multivariate log-GARCH model, respectively.
1134 Empirical Finance lifecontingencies Financial and Actuarial Mathematics for Life Contingencies Classes and methods that allow the user to manage life table, actuarial tables (also multiple decrements tables). Moreover, functions to easily perform demographic, financial and actuarial mathematics on life contingencies insurances calculations are contained therein.
1135 Empirical Finance lmtest Testing Linear Regression Models A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.
1136 Empirical Finance longmemo Statistics for Long-Memory Processes (Book Jan Beran), and Related Functionality Datasets and Functionality from ‘Jan Beran’ (1994). Statistics for Long-Memory Processes; Chapman & Hall. Estimation of Hurst (and more) parameters for fractional Gaussian noise, ‘fARIMA’ and ‘FEXP’ models.
1137 Empirical Finance LSMonteCarlo American options pricing with Least Squares Monte Carlo method The package compiles functions for calculating prices of American put options with Least Squares Monte Carlo method. The option types are plain vanilla American put, Asian American put, and Quanto American put. The pricing algorithms include variance reduction techniques such as Antithetic Variates and Control Variates. Additional functions are given to derive “price surfaces” at different volatilities and strikes, create 3-D plots, quickly generate Geometric Brownian motion, and calculate prices of European options with Black & Scholes analytical solution.
1138 Empirical Finance markovchain Easy Handling Discrete Time Markov Chains Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided.
1139 Empirical Finance MarkowitzR Statistical Significance of the Markowitz Portfolio A collection of tools for analyzing significance of Markowitz portfolios.
1140 Empirical Finance matchingMarkets Analysis of Stable Matchings Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
1141 Empirical Finance MSGARCH Markov-Switching GARCH Models Fit (by Maximum Likelihood or MCMC/Bayesian), simulate, and forecast various Markov-Switching GARCH models as described in Ardia et al. (2017) <https://ssrn.com/abstract=2845809>.
1142 Empirical Finance mvtnorm Multivariate Normal and t Distributions Computes multivariate normal and t probabilities, quantiles, random deviates and densities.
1143 Empirical Finance NetworkRiskMeasures Risk Measures for (Financial) Networks Implements some risk measures for (financial) networks, such as DebtRank, Impact Susceptibility, Impact Diffusion and Impact Fluidity.
1144 Empirical Finance nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
1145 Empirical Finance NMOF Numerical Methods and Optimization in Finance Functions, examples and data from the book “Numerical Methods and Optimization in Finance” by M. Gilli, D. Maringer and E. Schumann (2011), ISBN 978-0123756626. The package provides implementations of several optimisation heuristics, such as Differential Evolution, Genetic Algorithms and Threshold Accepting. There are also functions for the valuation of financial instruments, such as bonds and options, and functions that help with stochastic simulations.
1146 Empirical Finance obAnalytics Limit Order Book Analytics Data processing, visualisation and analysis of Limit Order Book event data.
1147 Empirical Finance OptHedging Estimation of value and hedging strategy of call and put options Estimation of value and hedging strategy of call and put options, based on optimal hedging and Monte Carlo method, from Chapter 3 of ‘Statistical Methods for Financial Engineering’, by Bruno Remillard, CRC Press, (2013).
1148 Empirical Finance OptionPricing Option Pricing with Efficient Simulation Algorithms Efficient Monte Carlo Algorithms for the price and the sensitivities of Asian and European Options under Geometric Brownian Motion.
1149 Empirical Finance pa Performance Attribution for Equity Portfolios A package that provides tools for conducting performance attribution for equity portfolios. The package uses two methods: the Brinson method and a regression-based analysis.
1150 Empirical Finance parma Portfolio Allocation and Risk Management Applications Provision of a set of models and methods for use in the allocation and management of capital in financial portfolios.
1151 Empirical Finance pbo Probability of Backtest Overfitting Following the method of Bailey et al., computes for a collection of candidate models the probability of backtest overfitting, the performance degradation and probability of loss, and the stochastic dominance.
1152 Empirical Finance PeerPerformance Luck-Corrected Peer Performance Analysis in R Provides functions to perform the peer performance analysis of funds’ returns as described in Ardia and Boudt (2018) <doi:10.1016/j.jbankfin.2017.10.014>.
1153 Empirical Finance PerformanceAnalytics (core) Econometric Tools for Performance and Risk Analysis Collection of econometric functions for performance and risk analysis. This package aims to aid practitioners and researchers in utilizing the latest research in analysis of non-normal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.
1154 Empirical Finance pinbasic Fast and Stable Estimation of the Probability of Informed Trading (PIN) Utilities for fast and stable estimation of the probability of informed trading (PIN) in the model introduced by Easley et al. (2002) <doi:10.1111/1540-6261.00493> are implemented. Since the basic model developed by Easley et al. (1996) <doi:10.1111/j.1540-6261.1996.tb04074.x> is nested in the former due to equating the intensity of uninformed buys and sells, functions can also be applied to this simpler model structure, if needed. State-of-the-art factorization of the model likelihood function as well as most recent algorithms for generating initial values for optimization routines are implemented. In total, two likelihood factorizations and three methodologies for starting values are included. Furthermore, functions for simulating datasets of daily aggregated buys and sells, calculating confidence intervals for the probability of informed trading and posterior probabilities of trading days’ conditions are available.
1155 Empirical Finance portfolio Analysing equity portfolios Classes for analysing and implementing equity portfolios.
1156 Empirical Finance PortfolioEffectHFT High Frequency Portfolio Analytics by PortfolioEffect R interface to PortfolioEffect cloud service for backtesting high frequency trading (HFT) strategies, intraday portfolio analysis and optimization. Includes auto-calibrating model pipeline for market microstructure noise, risk factors, price jumps/outliers, tail risk (high-order moments) and price fractality (long memory). Constructed portfolios could use client-side market data or access HF intraday price history for all major US Equities. See <https://www.portfolioeffect.com/> for more information on the PortfolioEffect high frequency portfolio analytics platform.
1157 Empirical Finance PortfolioOptim Small/Large Sample Portfolio Optimization Two functions for financial portfolio optimization by linear programming are provided. One function implements Benders decomposition algorithm and can be used for very large data sets. The other, applicable for moderate sample sizes, finds optimal portfolio which has the smallest distance to a given benchmark portfolio.
1158 Empirical Finance portfolioSim Framework for simulating equity portfolio strategies Classes that serve as a framework for designing equity portfolio simulations.
1159 Empirical Finance PortRisk Portfolio Risk Analysis Risk Attribution of a portfolio with Volatility Risk Analysis.
1160 Empirical Finance quantmod Quantitative Financial Modelling Framework Specify, build, trade, and analyse quantitative financial trading strategies.
1161 Empirical Finance QuantTools Enhanced Quantitative Trading Modelling Download and organize historical market data from multiple sources like Yahoo (<https://finance.yahoo.com>), Google (<https://www.google.com/finance>), Finam (<https://www.finam.ru/profile/moex-akcii/sberbank/export/>), MOEX (<https://www.moex.com/en/derivatives/contracts.aspx>) and IQFeed (<https://www.iqfeed.net/symbolguide/index.cfm?symbolguide=lookup>). Code your trading algorithms in modern C++11 with powerful event driven tick processing API including trading costs and exchange communication latency and transform detailed data seamlessly into R. In just few lines of code you will be able to visualize every step of your trading model from tick data to multi dimensional heat maps.
1162 Empirical Finance ragtop Pricing Equity Derivatives with Extensions of Black-Scholes Algorithms to price American and European equity options, convertible bonds and a variety of other financial derivatives. It uses an extension of the usual Black-Scholes model in which jump to default may occur at a probability specified by a power-law link between stock price and hazard rate as found in the paper by Takahashi, Kobayashi, and Nakagawa (2001) <doi:10.3905/jfi.2001.319302>. We use ideas and techniques from Andersen and Buffum (2002) <doi:10.2139/ssrn.355308> and Linetsky (2006) <doi:10.1111/j.1467-9965.2006.00271.x>.
1163 Empirical Finance Rbitcoin R & bitcoin integration Utilities related to Bitcoin. Unified markets API interface (bitstamp, kraken, btce, bitmarket). Both public and private API calls. Integration of data structures for all markets. Support SSL. Read Rbitcoin documentation (command: ?btc) for more information.
1164 Empirical Finance Rblpapi R Interface to ‘Bloomberg’ An R Interface to ‘Bloomberg’ is provided via the ‘Blp API’.
1165 Empirical Finance Rcmdr R Commander A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
1166 Empirical Finance RcppQuantuccia R Bindings to the ‘Quantuccia’ Header-Only Essentials of ‘QuantLib’ ‘QuantLib’ bindings are provided for R using ‘Rcpp’ and the header-only ‘Quantuccia’ variant (put together by Peter Caspers) offering an essential subset of ‘QuantLib’. See the included file ‘AUTHORS’ for a full list of contributors to both ‘QuantLib’ and ‘Quantuccia’.
1167 Empirical Finance reinsureR Reinsurance Treaties Application Application of reinsurance treaties to claims portfolios. The package creates a class Claims whose objective is to store claims and premiums, on which different treaties can be applied. A statistical analysis can then be applied to measure the impact of reinsurance, producing a table or graphical output. This package can be used for estimating the impact of reinsurance on several portfolios or for pricing treaties through statistical analysis. Documentation for the implemented methods can be found in “Reinsurance: Actuarial and Statistical Aspects” by Hansjoerg Albrecher, Jan Beirlant, Jozef L. Teugels (2017, ISBN: 978-0-470-77268-3) and “REINSURANCE: A Basic Guide to Facultative and Treaty Reinsurance” by Munich Re (2010) <https://www.munichre.com/site/mram/get/documents_E96160999/mram/assetpool.mr_america/PDFs/3_Publications/reinsurance_basic_guide.pdf>.
1168 Empirical Finance restimizeapi Functions for Working with the ‘www.estimize.com’ Web Services Provides the user with functions to develop their trading strategy, uncover actionable trading ideas, and monitor consensus shifts with crowdsourced earnings and economic estimate data directly from <www.estimize.com>. Further information regarding the web services this package invokes can be found at <www.estimize.com/api>.
1169 Empirical Finance Risk Computes 26 Financial Risk Measures for Any Continuous Distribution Computes 26 financial risk measures for any continuous distribution. The 26 financial risk measures include value at risk, expected shortfall due to Artzner et al. (1999) <doi:10.1007/s10957-011-9968-2>, tail conditional median due to Kou et al. (2013) <doi:10.1287/moor.1120.0577>, expectiles due to Newey and Powell (1987) <doi:10.2307/1911031>, beyond value at risk due to Longin (2001) <doi:10.3905/jod.2001.319161>, expected proportional shortfall due to Belzunce et al. (2012) <doi:10.1016/j.insmatheco.2012.05.003>, elementary risk measure due to Ahmadi-Javid (2012) <doi:10.1007/s10957-011-9968-2>, omega due to Shadwick and Keating (2002), sortino ratio due to Rollinger and Hoffman (2013), kappa due to Kaplan and Knowles (2004), Wang (1998)’s <doi:10.1080/10920277.1998.10595708> risk measures, Stone (1973)’s <doi:10.2307/2978638> risk measures, Luce (1980)’s <doi:10.1007/BF00135033> risk measures, Sarin (1987)’s <doi:10.1007/BF00126387> risk measures, Bronshtein and Kurelenkova (2009)’s risk measures.
1170 Empirical Finance riskParityPortfolio Design of Risk Parity Portfolios Fast design of risk parity portfolios for financial investment. The goal of the risk parity portfolio formulation is to equalize or distribute the risk contributions of the different assets, which is missing if we simply consider the overall volatility of the portfolio as in the mean-variance Markowitz portfolio. In addition to the vanilla formulation, where the risk contributions are perfectly equalized subject to no shortselling and budget constraints, many other formulations are considered that allow for box constraints and shortselling, as well as the inclusion of additional objectives like the expected return and overall variance. See vignette for a detailed documentation and comparison, with several illustrative examples. The package is based on the papers: Y. Feng, and D. P. Palomar (2015). SCRIP: Successive Convex Optimization Methods for Risk Parity Portfolio Design. IEEE Trans. on Signal Processing, vol. 63, no. 19, pp. 5285-5300. <doi:10.1109/TSP.2015.2452219>. F. Spinu (2013), An Algorithm for Computing Risk Parity Weights. <doi:10.2139/ssrn.2297383>. T. Griveau-Billion, J. Richard, and T. Roncalli (2013). A fast algorithm for computing High-dimensional risk parity portfolios. <arXiv:1311.4057>.
1171 Empirical Finance RiskPortfolios Computation of Risk-Based Portfolios Collection of functions designed to compute risk-based portfolios as described in Ardia et al. (2017) <doi:10.1007/s10479-017-2474-7> and Ardia et al. (2017) <doi:10.21105/joss.00171>.
1172 Empirical Finance riskSimul Risk Quantification for Stock Portfolios under the T-Copula Model Implements efficient simulation procedures to estimate tail loss probabilities and conditional excess for a stock portfolio. The log-returns are assumed to follow a t-copula model with generalized hyperbolic or t marginals.
1173 Empirical Finance RM2006 RiskMetrics 2006 Methodology Estimation the conditional covariance matrix using the RiskMetrics 2006 methodology of Zumbach (2007) <doi:10.2139/ssrn.1420185>.
1174 Empirical Finance rmgarch Multivariate GARCH Models Feasible multivariate GARCH models including DCC, GO-GARCH and Copula-GARCH.
1175 Empirical Finance RND Risk Neutral Density Extraction Package Extract the implied risk neutral density from options using various methods.
1176 Empirical Finance rpatrec Recognising Visual Charting Patterns in Time Series Data Generating visual charting patterns and noise, smoothing to find a signal in noisy time series and enabling users to apply their findings to real life data.
1177 Empirical Finance RQuantLib R Interface to the ‘QuantLib’ Library The ‘RQuantLib’ package makes parts of ‘QuantLib’ accessible from R The ‘QuantLib’ project aims to provide a comprehensive software framework for quantitative finance. The goal is to provide a standard open source library for quantitative analysis, modeling, trading, and risk management of financial assets.
1178 Empirical Finance rugarch (core) Univariate GARCH Models ARFIMA, in-mean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.
1179 Empirical Finance rwt Rice Wavelet Toolbox wrapper Provides a set of functions for performing digital signal processing.
1180 Empirical Finance sandwich Robust Covariance Matrix Estimators Model-robust standard error estimators for cross-sectional, time series, clustered, panel, and longitudinal data.
1181 Empirical Finance sde Simulation and Inference for Stochastic Differential Equations Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 978-0-387-75838-1, Springer, NY.
1182 Empirical Finance SharpeR Statistical Significance of the Sharpe Ratio A collection of tools for analyzing significance of assets, funds, and trading strategies, based on the Sharpe ratio and overfit of the same. Provides density, distribution, quantile and random generation of the Sharpe ratio distribution based on normal returns, as well as the optimal Sharpe ratio over multiple assets. Computes confidence intervals on the Sharpe and provides a test of equality of Sharpe ratios based on the Delta method.
1183 Empirical Finance sharpeRratio Moment-Free Estimation of Sharpe Ratios An efficient moment-free estimator of the Sharpe ratio, or signal-to-noise ratio, for heavy-tailed data (see <https://arxiv.org/abs/1505.01333>).
1184 Empirical Finance Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
1185 Empirical Finance SmithWilsonYieldCurve Smith-Wilson Yield Curve Construction Constructs a yield curve by the Smith-Wilson method from a table of LIBOR and SWAP rates
1186 Empirical Finance stochvol Efficient Bayesian Inference for Stochastic Volatility (SV) Models Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods. Methodological details are given in Kastner and Fruhwirth-Schnatter (2014) <doi:10.1016/j.csda.2013.01.002>; the most common use cases are described in Kastner (2016) <doi:10.18637/jss.v069.i05>. Also incorporates SV with leverage.
1187 Empirical Finance strucchange Testing, Monitoring, and Dating Structural Changes Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
1188 Empirical Finance TAQMNGR Manage Tick-by-Tick Transaction Data Manager of tick-by-tick transaction data that performs ‘cleaning’, ‘aggregation’ and ‘import’ in an efficient and fast way. The package engine, written in C++, exploits the ‘zlib’ and ‘gzstream’ libraries to handle gzipped data without need to uncompress them. ‘Cleaning’ and ‘aggregation’ are performed according to Brownlees and Gallo (2006) <doi:10.1016/j.csda.2006.09.030>. Currently, TAQMNGR processes raw data from WRDS (Wharton Research Data Service, <https://wrds-web.wharton.upenn.edu/wrds/>).
1189 Empirical Finance tawny Clean Covariance Matrices Using Random Matrix Theory and Shrinkage Estimators for Portfolio Optimization Portfolio optimization typically requires an estimate of a covariance matrix of asset returns. There are many approaches for constructing such a covariance matrix, some using the sample covariance matrix as a starting point. This package provides implementations for two such methods: random matrix theory and shrinkage estimation. Each method attempts to clean or remove noise related to the sampling process from the sample covariance matrix.
1190 Empirical Finance TFX R API to TrueFX(tm) Connects R to TrueFX(tm) for free streaming real-time and historical tick-by-tick market data for dealable interbank foreign exchange rates with millisecond detail.
1191 Empirical Finance tidyquant Tidy Quantitative Financial Analysis Bringing financial analysis to the ‘tidyverse’. The ‘tidyquant’ package provides a convenient wrapper to various ‘xts’, ‘zoo’, ‘quantmod’, ‘TTR’ and ‘PerformanceAnalytics’ package functions and returns the objects in the tidy ‘tibble’ format. The main advantage is being able to use quantitative functions with the ‘tidyverse’ functions including ‘purrr’, ‘dplyr’, ‘tidyr’, ‘ggplot2’, ‘lubridate’, etc. See the ‘tidyquant’ website for more information, documentation and examples.
1192 Empirical Finance timeDate (core) Rmetrics - Chronological and Calendar Objects The ‘timeDate’ class fulfils the conventions of the ISO 8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards it provides the “Financial Center” concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. It can thus also handle time stamps from historical data records from the same time zone, even if the financial centers changed day light saving times at different calendar dates.
1193 Empirical Finance timeSeries (core) Rmetrics - Financial Time Series Objects Provides a class and various tools for financial time series. This includes basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.
1194 Empirical Finance timsac Time Series Analysis and Control Package Functions for statistical analysis, prediction and control of time series based mainly on Akaike and Nakagawa (1988) <ISBN 978-90-277-2786-2>.
1195 Empirical Finance tis Time Indexes and Time Indexed Series Functions and S3 classes for time indexes and time indexed series, which are compatible with FAME frequencies.
1196 Empirical Finance TSdbi Time Series Database Interface Provides a common interface to time series databases. The objective is to define a standard interface so users can retrieve time series data from various sources with a simple, common, set of commands, and so programs can be written to be portable with respect to the data source. The SQL implementations also provide a database table design, so users needing to set up a time series database have a reasonably complete way to do this easily. The interface provides for a variety of options with respect to the representation of time series in R. The interface, and the SQL implementations, also handle vintages of time series data (sometime called editions or real-time data). There is also a (not yet well tested) mechanism to handle multilingual data documentation. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.
1197 Empirical Finance tsDyn Nonlinear Time Series Models with Regime Switching Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).
1198 Empirical Finance tseries (core) Time Series Analysis and Computational Finance Time series analysis and computational finance.
1199 Empirical Finance tseriesChaos Analysis of Nonlinear Time Series Routines for the analysis of nonlinear time series. This work is largely inspired by the TISEAN project, by Rainer Hegger, Holger Kantz and Thomas Schreiber: <http://www.mpipks-dresden.mpg.de/~tisean/>.
1200 Empirical Finance tsfa Time Series Factor Analysis Extraction of Factors from Multivariate Time Series. See ?00tsfa-Intro for more details.
1201 Empirical Finance TTR Technical Trading Rules Functions and data to construct technical trading rules with R.
1202 Empirical Finance tvm Time Value of Money Functions Functions for managing cashflows and interest rate curves.
1203 Empirical Finance urca (core) Unit Root and Cointegration Tests for Time Series Data Unit root and cointegration tests encountered in applied econometric analysis are implemented.
1204 Empirical Finance vars VAR Modelling Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.
1205 Empirical Finance VarSwapPrice Pricing a variance swap on an equity index Computes a portfolio of European options that replicates the cost of capturing the realised variance of an equity index.
1206 Empirical Finance vrtest Variance Ratio tests and other tests for Martingale Difference Hypothesis A collection of statistical tests for martingale difference hypothesis
1207 Empirical Finance wavelets Functions for Computing Wavelet Filters, Wavelet Transforms and Multiresolution Analyses Contains functions for computing and plotting discrete wavelet transforms (DWT) and maximal overlap discrete wavelet transforms (MODWT), as well as their inverses. Additionally, it contains functionality for computing and plotting wavelet transform filters that are used in the above decompositions as well as multiresolution analyses.
1208 Empirical Finance waveslim Basic Wavelet Routines for One-, Two- And Three-Dimensional Signal Processing Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dual-tree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 4-7 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.
1209 Empirical Finance wavethresh Wavelets Statistics and Transforms Performs 1, 2 and 3D real and complex-valued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complex-valued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.
1210 Empirical Finance XBRL Extraction of Business Financial Information from ‘XBRL’ Documents Functions to extract business financial information from an Extensible Business Reporting Language (‘XBRL’) instance file and the associated collection of files that defines its ‘Discoverable’ Taxonomy Set (‘DTS’).
1211 Empirical Finance xts (core) eXtensible Time Series Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
1212 Empirical Finance ycinterextra Yield curve or zero-coupon prices interpolation and extrapolation Yield curve or zero-coupon prices interpolation and extrapolation using the Nelson-Siegel, Svensson, Smith-Wilson models, and Hermite cubic splines.
1213 Empirical Finance YieldCurve Modelling and estimation of the yield curve Modelling the yield curve with some parametric models. The models implemented are: Nelson-Siegel, Diebold-Li and Svensson. The package also includes the data of the term structure of interest rate of Federal Reserve Bank and European Central Bank.
1214 Empirical Finance Zelig Everyone’s Statistical Software A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
1215 Empirical Finance zoo (core) S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
1216 Functional Data Analysis classiFunc Classification of Functional Data Efficient implementation of k-nearest neighbor estimation and kernel estimation for functional data classification.
1217 Functional Data Analysis covsep Tests for Determining if the Covariance Structure of 2-Dimensional Data is Separable Functions for testing if the covariance structure of 2-dimensional data (e.g. samples of surfaces X_i = X_i(s,t)) is separable, i.e. if covariance(X) = C_1 x C_2. A complete descriptions of the implemented tests can be found in the paper Aston, John A. D.; Pigoli, Davide; Tavakoli, Shahin. Tests for separability in nonparametric covariance operators of random surfaces. Ann. Statist. 45 (2017), no. 4, 14311461. <doi:10.1214/16-AOS1495> <https://projecteuclid.org/euclid.aos/1498636862> <arXiv:1505.02023>.
1218 Functional Data Analysis dbstats Distance-Based Statistics Prediction methods where explanatory information is coded as a matrix of distances between individuals. Distances can either be directly input as a distances matrix, a squared distances matrix, an inner-products matrix or computed from observed predictors.
1219 Functional Data Analysis ddalpha Depth-Based Classification and Calculation of Data Depth Contains procedures for depth-based supervised learning, which are entirely non-parametric, in particular the DDalpha-procedure (Lange, Mosler and Mozharovskyi, 2014 <doi:10.1007/s00362-012-0488-4>). The training data sample is transformed by a statistical depth function to a compact low-dimensional space, where the final classification is done. It also offers an extension to functional data and routines for calculating certain notions of statistical depth functions. 50 multivariate and 5 functional classification problems are included.
1220 Functional Data Analysis denseFLMM Functional Linear Mixed Models for Densely Sampled Data Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis.
1221 Functional Data Analysis fda (core) Functional Data Analysis These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer. They were ported from earlier versions in Matlab and S-PLUS. An introduction appears in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009) Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions of the code and sample analyses are no longer distributed through CRAN, as they were when the book was published. For those, ftp from <http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/> There you find a set of .zip files containing the functions and sample analyses, as well as two .txt files giving instructions for installation and some additional information. The changes from Version 2.4.1 are fixes of bugs in density.fd and removal of functions create.polynomial.basis, polynompen, and polynomial. These were deleted because the monomial basis does the same thing and because there were errors in the code.
1222 Functional Data Analysis fda.usc (core) Functional Data Analysis and Utilities for Statistical Computing Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.
1223 Functional Data Analysis fdadensity Functional Data Analysis for Density Functions by Transformation to a Hilbert Space An implementation of the methodology described in Petersen and Mueller (2016) <doi:10.1214/15-AOS1363> for the functional data analysis of samples of density functions. Densities are first transformed to their corresponding log quantile densities, followed by ordinary Functional Principal Components Analysis (FPCA). Transformation modes of variation yield improved interpretation of the variability in the data as compared to FPCA on the densities themselves. The standard fraction of variance explained (FVE) criterion commonly used for functional data is adapted to the transformation setting, also allowing for an alternative quantification of variability for density data through the Wasserstein metric of optimal transport.
1224 Functional Data Analysis fdakma Functional Data Analysis: K-Mean Alignment It performs simultaneously clustering and alignment of a multidimensional or unidimensional functional dataset by means of k-mean alignment.
1225 Functional Data Analysis fdapace (core) Functional Data Analysis and Empirical Dynamics Provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm or numerical integration. PACE is useful for the analysis of data that have been generated by a sample of underlying (but usually not fully observed) random trajectories. It does not rely on pre-smoothing of trajectories, which is problematic if functional data are sparsely sampled. PACE provides options for functional regression and correlation, for Longitudinal Data Analysis, the analysis of stochastic processes from samples of realized trajectories, and for the analysis of underlying dynamics. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
1226 Functional Data Analysis fdaPDE Functional Data Analysis and Partial Differential Equations; Statistical Analysis of Functional and Spatial Data, Based on Regression with Partial Differential Regularizations An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization.
1227 Functional Data Analysis fdasrvf (core) Elastic Functional Data Analysis Performs alignment, PCA, and modeling of multidimensional and unidimensional functions using the square-root velocity framework (Srivastava et al., 2011 <arXiv:1103.3817> and Tucker et al., 2014 <doi:10.1016/j.csda.2012.12.001>). This framework allows for elastic analysis of functional data through phase and amplitude separation.
1228 Functional Data Analysis fdatest Interval Testing Procedure for Functional Data Implementation of the Interval Testing Procedure for functional data in different frameworks (i.e., one or two-population frameworks, functional linear models) by means of different basis expansions (i.e., B-spline, Fourier, and phase-amplitude Fourier). The current version of the package requires functional data evaluated on a uniform grid; it automatically projects each function on a chosen functional basis; it performs the entire family of multivariate tests; and, finally, it provides the matrix of the p-values of the previous tests and the vector of the corrected p-values. The functional basis, the coupled or uncoupled scenario, and the kind of test can be chosen by the user. The package provides also a plotting function creating a graphical output of the procedure: the p-value heat-map, the plot of the corrected p-values, and the plot of the functional data.
1229 Functional Data Analysis FDboost (core) Boosting Functional Regression Models Regression models for functional data, i.e., scalar-on-function, function-on-scalar and function-on-function regression models, are fitted by a component-wise gradient boosting algorithm.
1230 Functional Data Analysis fdcov Analysis of Covariance Operators Provides a variety of tools for the analysis of covariance operators including k-sample tests for equality and classification and clustering methods found in the works of Cabassi et al (2017) <doi:10.1214/17-EJS1347>, Kashlak et al (2017) <arXiv:1604.06310>, Pigoli et al (2014) <doi:10.1093/biomet/asu008>, and Panaretos et al (2010) <doi:10.1198/jasa.2010.tm09239>.
1231 Functional Data Analysis fds (core) Functional Data Sets Functional data sets.
1232 Functional Data Analysis flars Functional LARS Variable selection algorithm for functional linear regression with scalar response variable and mixed scalar/functional predictors.
1233 Functional Data Analysis fpca Restricted MLE for Functional Principal Components Analysis A geometric approach to MLE for functional principal components
1234 Functional Data Analysis freqdom Frequency Domain Based Analysis: Dynamic PCA Implementation of dynamic principal component analysis (DPCA), simulation of VAR and VMA processes and frequency domain tools. These frequency domain methods for dimensionality reduction of multivariate time series were introduced by David Brillinger in his book Time Series (1974). We follow implementation guidelines as described in Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.
1235 Functional Data Analysis freqdom.fda Functional Time Series: Dynamic Functional Principal Components Implementations of functional dynamic principle components analysis. Related graphic tools and frequency domain methods. These methods directly use multivariate dynamic principal components implementation, following the guidelines from Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.
1236 Functional Data Analysis ftsa (core) Functional Time Series Analysis Functions for visualizing, modeling, forecasting and hypothesis testing of functional time series.
1237 Functional Data Analysis ftsspec Spectral Density Estimation and Comparison for Functional Time Series Functions for estimating spectral density operator of functional time series (FTS) and comparing the spectral density operator of two functional time series, in a way that allows detection of differences of the spectral density operator in frequencies and along the curve length.
1238 Functional Data Analysis funData An S4 Class for Functional Data S4 classes for univariate and multivariate functional data with utility functions.
1239 Functional Data Analysis funFEM Clustering in the Discriminative Functional Subspace The funFEM algorithm (Bouveyron et al., 2014) allows to cluster functional data by modeling the curves within a common and discriminative functional subspace.
1240 Functional Data Analysis funHDDC Univariate and Multivariate Model-Based Clustering in Group-Specific Functional Subspaces The funHDDC algorithm allows to cluster functional univariate (Bouveyron and Jacques, 2011, <doi:10.1007/s11634-011-0095-6>) or multivariate data (Schmutz et al., 2018) by modeling each group within a specific functional subspace.
1241 Functional Data Analysis funLBM Model-Based Co-Clustering of Functional Data The funLBM algorithm allows to simultaneously cluster the rows and the columns of a data matrix where each entry of the matrix is a function or a time series.
1242 Functional Data Analysis geofd Spatial Prediction for Function Value Data Kriging based methods are used for predicting functional data (curves) with spatial dependence.
1243 Functional Data Analysis GPFDA Apply Gaussian Process in Functional data analysis Use functional regression as the mean structure and Gaussian Process as the covariance structure.
1244 Functional Data Analysis growfunctions Bayesian Non-Parametric Dependent Models for Time-Indexed Functional Data Estimates a collection of time-indexed functions under either of Gaussian process (GP) or intrinsic Gaussian Markov random field (iGMRF) prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same covariance or precision parameters. The GP and iGMRF formulations both support any number of additive covariance or precision terms, respectively, expressing either or both of multiple trend and seasonality.
1245 Functional Data Analysis pcdpca Dynamic Principal Components for Periodically Correlated Functional Time Series Method extends multivariate and functional dynamic principal components to periodically correlated multivariate time series. This package allows you to compute true dynamic principal components in the presence of periodicity. We follow implementation guidelines as described in Kidzinski, Kokoszka and Jouzdani (2017), in Principal component analysis of periodically correlated functional time series <arXiv:1612.00040>.
1246 Functional Data Analysis rainbow Bagplots, Boxplots and Rainbow Plots for Functional Data Visualizing functional data and identifying functional outliers.
1247 Functional Data Analysis refund (core) Regression with Functional Data Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.
1248 Functional Data Analysis refund.shiny Interactive Plotting for Functional Data Analyses Interactive plotting for functional data analyses.
1249 Functional Data Analysis RFgroove Importance Measure and Selection for Groups of Variables with Random Forests Variable selection tools for groups of variables and functional data based on a new grouped variable importance with random forests.
1250 Functional Data Analysis roahd Robust Analysis of High Dimensional Data A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in high-dimensional cases, and hence with attention to computational efficiency and simplicity of use.
1251 Functional Data Analysis SCBmeanfd Simultaneous Confidence Bands for the Mean of Functional Data Statistical methods for estimating and inferring the mean of functional data. The methods include simultaneous confidence bands, local polynomial fitting, bandwidth selection by plug-in and cross-validation, goodness-of-fit tests for parametric models, equality tests for two-sample problems, and plotting functions.
1252 Functional Data Analysis sparseFLMM Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data Estimation of functional linear mixed models for irregularly or sparsely sampled data based on functional principal component analysis.
1253 Functional Data Analysis splinetree Longitudinal Regression Trees and Forests Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.
1254 Functional Data Analysis switchnpreg Switching nonparametric regression models for a single curve and functional data Functions for estimating the parameters from the latent state process and the functions corresponding to the J states as proposed by De Souza and Heckman (2013).
1255 Functional Data Analysis warpMix Mixed Effects Modeling with Warping for Functional Data Using B-Spline Mixed effects modeling with warping for functional data using B- spline. Warping coefficients are considered as random effects, and warping functions are general functions, parameters representing the projection onto B- spline basis of a part of the warping functions. Warped data are modelled by a linear mixed effect functional model, the noise is Gaussian and independent from the warping functions.
1256 Statistical Genetics adegenet Exploratory Analysis of Genetic and Genomic Data Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure (‘genind’ class), alleles counts by populations (‘genpop’), and genome-wide SNP data (‘genlight’). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.
1257 Statistical Genetics ape Analyses of Phylogenetics and Evolution Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
1258 Statistical Genetics Biodem Biodemography Functions The Biodem package provides a number of functions for Biodemographic analysis.
1259 Statistical Genetics bqtl Bayesian QTL Mapping Toolkit QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools.
1260 Statistical Genetics dlmap Detection Localization Mapping for QTL QTL mapping in a mixed model framework with separate detection and localization stages. The first stage detects the number of QTL on each chromosome based on the genetic variation due to grouped markers on the chromosome; the second stage uses this information to determine the most likely QTL positions. The mixed model can accommodate general fixed and random effects, including spatial effects in field trials and pedigree effects. Applicable to backcrosses, doubled haploids, recombinant inbred lines, F2 intercrosses, and association mapping populations.
1261 Statistical Genetics gap (core) Genetic Analysis Package It is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates.
1262 Statistical Genetics genetics (core) Population Genetics Classes and methods for handling genetic data. Includes classes to represent genotypes and haplotypes at single markers up to multiple markers on multiple chromosomes. Function include allele frequencies, flagging homo/heterozygotes, flagging carriers of certain alleles, estimating and testing for Hardy-Weinberg disequilibrium, estimating and testing for linkage disequilibrium, …
1263 Statistical Genetics hapassoc Inference of Trait Associations with SNP Haplotypes and Other Attributes using the EM Algorithm The following R functions are used for inference of trait associations with haplotypes and other covariates in generalized linear models. The functions are developed primarily for data collected in cohort or cross-sectional studies. They can accommodate uncertain haplotype phase and handle missing genotypes at some SNPs.
1264 Statistical Genetics haplo.stats (core) Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous Routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The main functions are: haplo.em(), haplo.glm(), haplo.score(), and haplo.power(); all of which have detailed examples in the vignette.
1265 Statistical Genetics HardyWeinberg Statistical Tests and Graphics for Hardy-Weinberg Equilibrium Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) <doi:10.1126/science.28.706.49> for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots.
1266 Statistical Genetics hierfstat Estimation and Tests of Hierarchical F-Statistics Allows the estimation of hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution, 1998, 52(4):950-956; <doi:10.2307/2411227>. Functions are also given to test via randomisations the significance of each F and variance components, using the likelihood-ratio statistics G.
1267 Statistical Genetics hwde Models and Tests for Departure from Hardy-Weinberg Equilibrium and Independence Between Loci Fits models for genotypic disequilibria, as described in Huttley and Wilson (2000), Weir (1996) and Weir and Wilson (1986). Contrast terms are available that account for first order interactions between loci. Also implements, for a single locus in a single population, a conditional exact test for Hardy-Weinberg equilibrium.
1268 Statistical Genetics ibdreg Regression Methods for IBD Linkage With Covariates A method to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. Account for correlations of IBD statistics and covariates for relative pairs within the same pedigree.
1269 Statistical Genetics LDheatmap Graphical Display of Pairwise Linkage Disequilibria Between SNPs Produces a graphical display, as a heat map, of measures of pairwise linkage disequilibria between SNPs. Users may optionally include the physical locations or genetic map distances of each SNP on the plot.
1270 Statistical Genetics ouch Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.
1271 Statistical Genetics pbatR Pedigree/Family-Based Genetic Association Tests Analysis and Power This R package provides power calculations via internal simulation methods. The package also provides a frontend to the now abandoned PBAT program (developed by Christoph Lange), and reads in the corresponding output and displays results and figures when appropriate. The license of this R package itself is GPL. However, to have the program interact with the PBAT program for some functionality of the R package, users must additionally obtain the PBAT program from Christoph Lange, and accept his license. Both the data analysis and power calculations have command line and graphical interfaces using tcltk.
1272 Statistical Genetics phangorn Phylogenetic Reconstruction and Analysis Package contains methods for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation. Allows to compare trees, models selection and offers visualizations for trees and split networks.
1273 Statistical Genetics qtl Tools for Analyzing QTL Experiments Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits.
1274 Statistical Genetics rmetasim An Individual-Based Population Genetic Simulation Environment An interface between R and the metasim simulation engine. The simulation environment is documented in: “Strand, A.(2002) <doi:10.1046/j.1471-8286.2002.00208.x> Metasim 1.0: an individual-based environment for simulating population genetics of complex population dynamics. Mol. Ecol. Notes. Please see the vignettes CreatingLandscapes and Simulating to get some ideas on how to use the packages. See the rmetasim vignette to get an overview and to see important changes to the code in the most recent version.
1275 Statistical Genetics seqinr Biological Sequences Retrieval and Analysis Exploratory data analysis and data visualization for biological sequence (DNA and protein) data. Seqinr includes utilities for sequence data management under the ACNUC system described in Gouy, M. et al. (1984) Nucleic Acids Res. 12:121-127 <doi:10.1093/nar/12.1Part1.121>.
1276 Statistical Genetics snp.plotter snp.plotter Creates plots of p-values using single SNP and/or haplotype data. Main features of the package include options to display a linkage disequilibrium (LD) plot and the ability to plot multiple datasets simultaneously. Plots can be created using global and/or individual haplotype p-values along with single SNP p-values. Images are created as either PDF/EPS files.
1277 Statistical Genetics SNPmaxsel Maximally selected statistics for SNP data This package implements asymptotic methods related to maximally selected statistics, with applications to SNP data.
1278 Statistical Genetics stepwise Stepwise detection of recombination breakpoints A stepwise approach to identifying recombination breakpoints in a sequence alignment.
1279 Statistical Genetics tdthap TDT Tests for Extended Haplotypes Functions and examples are provided for Transmission/disequilibrium tests for extended marker haplotypes, as in Clayton, D. and Jones, H. (1999) “Transmission/disequilibrium tests for extended marker haplotypes”. Amer. J. Hum. Genet., 65:1161-1169, <doi:10.1086/302566>.
1280 Statistical Genetics untb Ecological Drift under the UNTB Hubbell’s Unified Neutral Theory of Biodiversity.
1281 Statistical Genetics wgaim Whole Genome Average Interval Mapping for QTL Detection using Mixed Models Integrates sophisticated mixed modelling methods with a whole genome approach to detecting significant QTL in linkage maps.
1282 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ade4 Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
1283 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization animation A Gallery of Animations in Statistics and Utilities to Create Animations Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, non-parametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, ‘GIF’, HTML pages, ‘PDF’ and videos. ‘PDF’ animations can be inserted into ‘Sweave’ / ‘knitr’ easily.
1284 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ape Analyses of Phylogenetics and Evolution Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
1285 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization aplpack Another Plot Package: ‘Bagplots’, ‘Iconplots’, ‘Summaryplots’, Slider Functions and Others Some functions for drawing some special plots: The function ‘bagplot’ plots a bagplot, ‘faces’ plots chernoff faces, ‘iconplot’ plots a representation of a frequency table or a data matrix, ‘plothulls’ plots hulls of a bivariate data set, ‘plotsummary’ plots a graphical summary of a data set, ‘puticon’ adds icons to a plot, ‘skyline.hist’ combines several histograms of a one dimensional data set in one plot, ‘slider’ functions supports some interactive graphics, ‘spin3R’ helps an inspection of a 3-dim point cloud, ‘stem.leaf’ plots a stem and leaf plot, ‘stem.leaf.backback’ plots back-to-back versions of stem and leaf plot.
1286 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ash David Scott’s ASH Routines David Scott’s ASH routines ported from S-PLUS to R.
1287 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization biclust BiCluster Algorithms The main function biclust() provides several algorithms to find biclusters in two-dimensional data: Cheng and Church (2000, ISBN:1-57735-115-0), spectral (2003) <doi:10.1101/gr.648603>, plaid model (2005) <doi:10.1016/j.csda.2004.02.003>, xmotifs (2003) <doi:10.1142/9789812776303_0008> and bimax (2006) <doi:10.1093/bioinformatics/btl060>. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.
1288 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization Cairo R Graphics Device using Cairo Graphics Library for Creating High-Quality Bitmap (PNG, JPEG, TIFF), Vector (PDF, SVG, PostScript) and Display (X11 and Win32) Output R graphics device using cairographics library that can be used to create high-quality vector (PDF, PostScript and SVG) and bitmap output (PNG,JPEG,TIFF), and high-quality rendering in displays (X11 and Win32). Since it uses the same back-end for all output, copying across formats is WYSIWYG. Files are created without the dependence on X11 or other external programs. This device supports alpha channel (semi-transparent drawing) and resulting images can contain transparent and semi-transparent regions. It is ideal for use in server environments (file output) and as a replacement for other devices that don’t have Cairo’s capabilities such as alpha support or anti-aliasing. Backends are modular such that any subset of backends is supported.
1289 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization cairoDevice Embeddable Cairo Graphics Device Driver This device uses Cairo and GTK to draw to the screen, file (png, svg, pdf, and ps) or memory (arbitrary GdkDrawable or Cairo context). The screen device may be embedded into RGtk2 interfaces and supports all interactive features of other graphics devices, including getGraphicsEvent().
1290 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization cba Clustering for Business Analytics Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.
1291 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization colorspace A Toolbox for Manipulating and Assessing Colors and Palettes Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided along with corresponding ggplot2 color scales. Color palette choice is aided by an interactive app (with either a Tcl/Tk or a shiny GUI) and shiny apps with an HCL color picker and a color vision deficiency emulator. Plotting functions for displaying and assessing palettes include color swatches, visualizations of the HCL space, and trajectories in HCL and/or RGB spectrum. Color manipulation functions include: desaturation, lightening/darkening, mixing, and simulation of color vision deficiencies (deutanomaly, protanomaly, tritanomaly).
1292 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization diagram Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book “A practical guide to ecological modelling - using R as a simulation platform” by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book “Solving Differential Equations in R” by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).
1293 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization dichromat Color Schemes for Dichromats Collapse red-green or green-blue distinctions to simulate the effects of different types of color-blindness.
1294 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gclus Clustering Graphics Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.
1295 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization ggplot2 (core) Create Elegant Data Visualisations Using the Grammar of Graphics A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
1296 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gplots Various R Programming Tools for Plotting Data Various R programming tools for plotting data, including: - calculating and plotting locally smoothed summary function as (‘bandplot’, ‘wapply’), - enhanced versions of standard plots (‘barplot2’, ‘boxplot2’, ‘heatmap.2’, ‘smartlegend’), - manipulating colors (‘col2hex’, ‘colorpanel’, ‘redgreen’, ‘greenred’, ‘bluered’, ‘redblue’, ‘rich.colors’), - calculating and plotting two-dimensional data summaries (‘ci2d’, ‘hist2d’), - enhanced regression diagnostic plots (‘lmplot2’, ‘residplot’), - formula-enabled interface to ‘stats::lowess’ function (‘lowess’), - displaying textual data in plots (‘textplot’, ‘sinkplot’), - plotting a matrix where each cell contains a dot whose size reflects the relative magnitude of the elements (‘balloonplot’), - plotting “Venn” diagrams (‘venn’), - displaying Open-Office style plots (‘ooplot’), - plotting multiple data on same region, with separate axes (‘overplot’), - plotting means and confidence intervals (‘plotCI’, ‘plotmeans’), - spacing points in an x-y plot so they don’t overlap (‘space’).
1297 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gridBase Integration of base and grid graphics Integration of base and grid graphics
1298 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization hexbin Hexagonal Binning Routines Binning and plotting functions for hexagonal bins.
1299 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization IDPmisc ‘Utilities of Institute of Data Analyses and Process Design (www.zhaw.ch/idp)’ Different high-level graphics functions for displaying large datasets, displaying circular data in a very flexible way, finding local maxima, brewing color ramps, drawing nice arrows, zooming 2D-plots, creating figures with differently colored margin and plot region. In addition, the package contains auxiliary functions for data manipulation like omitting observations with irregular values or selecting data by logical vectors, which include NAs. Other functions are especially useful in spectroscopy and analyses of environmental data: robust baseline fitting, finding peaks in spectra, converting humidity measures.
1300 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization igraph Network Analysis and Visualization Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
1301 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization iplots iPlots - interactive graphics for R Interactive plots for R.
1302 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization JavaGD Java Graphics Device Graphics device routing all graphics commands to a Java program. The actual functionality of the JavaGD depends on the Java-side implementation. Simple AWT and Swing implementations are included.
1303 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization klaR Classification and Visualization Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to ‘svmlight’ and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.
1304 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization lattice (core) Trellis Graphics for R A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.
1305 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization latticeExtra Extra Graphical Utilities Based on Lattice Building on the infrastructure provided by the lattice package, this package provides several new high-level functions and methods, as well as additional utilities such as panel and axis annotation functions.
1306 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization misc3d Miscellaneous 3D Plots A collection of miscellaneous 3d plots, including isosurfaces.
1307 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization onion Octonions and Quaternions Quaternions and Octonions are four- and eight- dimensional extensions of the complex numbers. They are normed division algebras over the real numbers and find applications in spatial rotations (quaternions) and string theory and relativity (octonions). The quaternions are noncommutative and the octonions nonassociative. See RKS Hankin 2006, Rnews Volume 6/2: 49-51, and the package vignette, for more details.
1308 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization plotrix (core) Various Plotting Functions Lots of plots, various labeling, axis and color scaling functions.
1309 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RColorBrewer (core) ColorBrewer Palettes Provides color schemes for maps (and other graphics) designed by Cynthia Brewer as described at http://colorbrewer2.org
1310 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization rggobi Interface Between R and ‘GGobi’ A command-line interface to ‘GGobi’, an interactive and dynamic graphics package. ‘Rggobi’ complements the graphical user interface of ‘GGobi’ providing a way to fluidly transition between analysis and exploration, as well as automating common tasks.
1311 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization rgl (core) 3D Visualization Using OpenGL Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.
1312 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RGraphics Data and Functions from the Book R Graphics, Second Edition Data and Functions from the book R Graphics, Second Edition. There is a function to produce each figure in the book, plus several functions, classes, and methods defined in Chapter 8.
1313 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RGtk2 R Bindings for Gtk 2.8.0 and Above Facilities in the R language for programming graphical interfaces using Gtk, the Gimp Tool Kit.
1314 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RSvgDevice An R SVG graphics device A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics.
1315 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization RSVGTipsDevice An R SVG Graphics Device with Dynamic Tips and Hyperlinks A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics. This version supports tooltips with 1 to 3 lines, hyperlinks, and line styles.
1316 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization scagnostics Compute scagnostics - scatterplot diagnostics Calculates graph theoretic scagnostics. Scagnostics describe various measures of interest for pairs of variables, based on their appearance on a scatterplot. They are useful tool for discovering interesting or unusual scatterplots from a scatterplot matrix, without having to look at every individual plot.
1317 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization scatterplot3d 3D Scatter Plot Plots a three dimensional (3D) point cloud.
1318 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization seriation Infrastructure for Ordering Objects Using Seriation Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).
1319 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization tkrplot TK Rplot Simple mechanism for placing R graphics in a Tk widget.
1320 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization vcd (core) Visualizing Categorical Data Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).
1321 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization vioplot Violin Plot A violin plot is a combination of a box plot and a kernel density plot. This package allows extensive customisation of violin plots.
1322 Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization xgobi Interface to the XGobi and XGvis programs for graphical data analysis Interface to the XGobi and XGvis programs for graphical data analysis.
1323 High-Performance and Parallel Computing with R aprof Amdahl’s Profiler, Directed Optimization Made Easy Assists the evaluation of whether and where to focus code optimization, using Amdahl’s law and visual aids based on line profiling. Amdahl’s profiler organizes profiling output files (including memory profiling) in a visually appealing way. It is meant to help to balance development vs. execution time by helping to identify the most promising sections of code to optimize and projecting potential gains. The package is an addition to R’s standard profiling tools and is not a wrapper for them.
1324 High-Performance and Parallel Computing with R batch Batching Routines in Parallel and Passing Command-Line Arguments to R Functions to allow you to easily pass command-line arguments into R, and functions to aid in submitting your R code in parallel on a cluster and joining the results afterward (e.g. multiple parameter values for simulations running in parallel, splitting up a permutation test in parallel, etc.). See ‘parseCommandArgs(…)’ for the main example of how to use this package.
1325 High-Performance and Parallel Computing with R BatchExperiments Statistical Experiments on Batch Computing Clusters Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.
1326 High-Performance and Parallel Computing with R BatchJobs Batch Computing with R Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.
1327 High-Performance and Parallel Computing with R batchtools Tools for Computation on Batch Systems As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ (<https://www.ibm.com/us-en/marketplace/hpc-workload-management>), ‘OpenLava’ (<http://www.openlava.org/>), ‘Univa Grid Engine’/‘Oracle Grid Engine’ (<http://www.univa.com/>), ‘Slurm’ (<http://slurm.schedmd.com/>), ‘TORQUE/PBS’ (<http://www.adaptivecomputing.com/products/open-source/torque/>), or ‘Docker Swarm’ (<https://docs.docker.com/swarm/>). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.
1328 High-Performance and Parallel Computing with R bcp Bayesian Analysis of Change Point Problems Provides an implementation of the Barry and Hartigan (1993) product partition model for the normal errors change point problem using Markov Chain Monte Carlo. It also extends the methodology to regression models on a connected graph (Wang and Emerson, 2015); this allows estimation of change point models with multivariate responses. Parallel MCMC, previously available in bcp v.3.0.0, is currently not implemented.
1329 High-Performance and Parallel Computing with R BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Letac et al. (2018) <arXiv:1706.04416>, Dobra and Mohammadi (2018) <doi:10.1214/18-AOAS1164>, Mohammadi et al. (2017) <doi:10.1111/rssc.12171>. To speed up the computations, the BDMCMC sampling algorithms are implemented in parallel using OpenMP in C++.
1330 High-Performance and Parallel Computing with R biglars Scalable Least-Angle Regression and Lasso Least-angle regression, lasso and stepwise regression for numeric datasets in which the number of observations is greater than the number of predictors. The functions can be used with the ff library to accomodate datasets that are too large to be held in memory.
1331 High-Performance and Parallel Computing with R biglm bounded memory linear and generalized linear models Regression for data too large to fit in memory
1332 High-Performance and Parallel Computing with R bigmemory Manage Massive Matrices with Shared Memory and Memory-Mapped Files Create, store, access, and manipulate massive matrices. Matrices are allocated to shared memory and may use memory-mapped files. Packages ‘biganalytics’, ‘bigtabulate’, ‘synchronicity’, and ‘bigalgebra’ provide advanced functionality.
1333 High-Performance and Parallel Computing with R bigstatsr Statistical Tools for Filebacked Big Matrices Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.
1334 High-Performance and Parallel Computing with R bnlearn Bayesian Network Structure Learning, Parameter Learning and Inference Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC and RSMAX2) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries and cross-validation. Development snapshots with the latest bugfixes are available from <http://www.bnlearn.com>.
1335 High-Performance and Parallel Computing with R caret Classification and Regression Training Misc functions for training and plotting classification and regression models.
1336 High-Performance and Parallel Computing with R clustermq Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM, PBS/Torque) Evaluate arbitrary function calls using workers on HPC schedulers in single line of code. All processing is done on the network without accessing the file system. Remote schedulers are supported via SSH.
1337 High-Performance and Parallel Computing with R data.table Extension of ‘data.frame’ Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
1338 High-Performance and Parallel Computing with R dclone Data Cloning and MCMC Tools for Maximum Likelihood Methods Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.
1339 High-Performance and Parallel Computing with R doFuture A Universal Foreach Parallel Adapter using the Future API of the ‘future’ Package Provides a ‘%dopar%’ adapter such that any type of futures can be used as backends for the ‘foreach’ framework.
1340 High-Performance and Parallel Computing with R doMC Foreach Parallel Adaptor for ‘parallel’ Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package.
1341 High-Performance and Parallel Computing with R doMPI Foreach Parallel Adaptor for the Rmpi Package Provides a parallel backend for the %dopar% function using the Rmpi package.
1342 High-Performance and Parallel Computing with R doRedis Foreach parallel adapter for the rredis package A Redis parallel backend for the %dopar% function
1343 High-Performance and Parallel Computing with R doRNG Generic Reproducible Parallel Backend for ‘foreach’ Loops Provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by L’Ecuyer’s combined multiple-recursive generator [L’Ecuyer (1999), <doi:10.1287/opre.47.1.159>]. It enables to easily convert standard %dopar% loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.
1344 High-Performance and Parallel Computing with R doSNOW Foreach Parallel Adaptor for the ‘snow’ Package Provides a parallel backend for the %dopar% function using the snow package of Tierney, Rossini, Li, and Sevcikova.
1345 High-Performance and Parallel Computing with R dqrng Fast Pseudo Random Number Generators Several fast random number generators are provided as C++ header only libraries: The PCG family by O’Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as Xoroshiro128+ and Xoshiro256+ by Blackman and Vigna (2018 <arXiv:1805.01407>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+ and Xoshiro256+ as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011 <doi:10.1145/2063384.2063405>) as provided by the package ‘sitmo’.
1346 High-Performance and Parallel Computing with R drake A Pipeline Toolkit for Reproducible Computation at Scale A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://ropensci.github.io/drake/> and the online manual <https://ropenscilabs.github.io/drake-manual/>.
1347 High-Performance and Parallel Computing with R ff Memory-Efficient Storage of Large Data on Disk and Fast Access Functions The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R’s standard atomic data types ‘double’, ‘logical’, ‘raw’ and ‘integer’ and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example ‘quad’ allows efficient storage of genomic data as an ‘A’,‘T’,‘G’,‘C’ factor. The unsigned types support ‘circular’ arithmetic. There is also support for close-to-atomic types ‘factor’, ‘ordered’, ‘POSIXct’, ‘Date’ and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with ‘permanent’ files as well as creating/removing ‘temporary’ ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, ‘logicals’ and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package ‘bit’: chunked looping, fast bit operations and coercions between different objects that can store subscript information (‘bit’, ‘bitwhich’, ff ‘boolean’, ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.
1348 High-Performance and Parallel Computing with R ffbase Basic Statistical Functions for Package ‘ff’ Extends the out of memory vectors of ‘ff’ with statistical functions and other utilities to ease their usage.
1349 High-Performance and Parallel Computing with R flowr Streamlining Design and Deployment of Complex Workflows This framework allows you to design and implement complex pipelines, and deploy them on your institution’s computing cluster. This has been built keeping in mind the needs of bioinformatics workflows. However, it is easily extendable to any field where a series of steps (shell commands) are to be executed in a (work)flow.
1350 High-Performance and Parallel Computing with R foreach Provides Foreach Looping Construct for R Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn’t require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.
1351 High-Performance and Parallel Computing with R future Unified Parallel and Distributed Processing in R for Everyone The purpose of this package is to provide a lightweight and unified Future API for sequential and parallel processing of R expression via futures. The simplest way to evaluate an expression in parallel is to use ‘x %<-% { expression }’ with ‘plan(multiprocess)’. This package implements sequential, multicore, multisession, and cluster futures. With these, R expressions can be evaluated on the local machine, in parallel a set of local machines, or distributed on a mix of local and remote machines. Extensions to this package implement additional backends for processing futures via compute cluster schedulers etc. Because of its unified API, there is no need to modify any code in order switch from sequential on the local machine to, say, distributed processing on a remote compute cluster. Another strength of this package is that global variables and functions are automatically identified and exported as needed, making it straightforward to tweak existing code to make use of futures.
1352 High-Performance and Parallel Computing with R future.BatchJobs A Future API for Parallel and Distributed Processing using BatchJobs Implementation of the Future API on top of the ‘BatchJobs’ package. This allows you to process futures, as defined by the ‘future’ package, in parallel out of the box, not only on your local machine or ad-hoc cluster of machines, but also via high-performance compute (‘HPC’) job schedulers such as ‘LSF’, ‘OpenLava’, ‘Slurm’, ‘SGE’, and ‘TORQUE’ / ‘PBS’, e.g. ‘y <- future.apply::future_lapply(files, FUN = process)’. NOTE: The ‘BatchJobs’ package is deprecated in favor of the ‘batchtools’ package. Because of this, it is recommended to use the ‘future.batchtools’ package instead of this package.
1353 High-Performance and Parallel Computing with R GAMBoost Generalized linear and additive models by likelihood based boosting This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized B-splines
1354 High-Performance and Parallel Computing with R gcbd ‘GPU’/CPU Benchmarking in Debian-Based Systems ‘GPU’/CPU Benchmarking on Debian-package based systems This package benchmarks performance of a few standard linear algebra operations (such as a matrix product and QR, SVD and LU decompositions) across a number of different ‘BLAS’ libraries as well as a ‘GPU’ implementation. To do so, it takes advantage of the ability to ‘plug and play’ different ‘BLAS’ implementations easily on a Debian and/or Ubuntu system. The current version supports - ‘Reference BLAS’ (‘refblas’) which are un-accelerated as a baseline - Atlas which are tuned but typically configure single-threaded - Atlas39 which are tuned and configured for multi-threaded mode - ‘Goto Blas’ which are accelerated and multi-threaded - ‘Intel MKL’ which is a commercial accelerated and multithreaded version. As for ‘GPU’ computing, we use the CRAN package - ‘gputools’ For ‘Goto Blas’, the ‘gotoblas2-helper’ script from the ISM in Tokyo can be used. For ‘Intel MKL’ we use the Revolution R packages from Ubuntu 9.10.
1355 High-Performance and Parallel Computing with R gpuR GPU Functions for R Objects Provides GPU enabled functions for R objects in a simple and approachable manner. New gpu* and vcl* classes have been provided to wrap typical R objects (e.g. vector, matrix), in both host and device spaces, to mirror typical R syntax without the need to know OpenCL.
1356 High-Performance and Parallel Computing with R GUIProfiler Graphical User Interface for Rprof() Show graphically the results of profiling R functions by tracking their execution time.
1357 High-Performance and Parallel Computing with R h2o R Interface for ‘H2O’ R interface for ‘H2O’, the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).
1358 High-Performance and Parallel Computing with R HadoopStreaming Utilities for using R scripts in Hadoop streaming Provides a framework for writing map/reduce scripts for use in Hadoop Streaming. Also facilitates operating on data in a streaming fashion, without Hadoop.
1359 High-Performance and Parallel Computing with R HistogramTools Utility Functions for R Histograms Provides a number of utility functions useful for manipulating large histograms. This includes methods to trim, subset, merge buckets, merge histograms, convert to CDF, and calculate information loss due to binning. It also provides a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment.
1360 High-Performance and Parallel Computing with R inline Functions to Inline C, C++, Fortran Function Calls from R Functionality to dynamically define R functions and S4 methods with ‘inlined’ C, C++ or Fortran code supporting the .C and .Call calling conventions.
1361 High-Performance and Parallel Computing with R keras R Interface to ‘Keras’ Interface to ‘Keras’ <https://keras.io>, a high-level neural networks ‘API’. ‘Keras’ was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both ‘CPU’ and ‘GPU’ devices.
1362 High-Performance and Parallel Computing with R LaF Fast Access to Large ASCII Files Methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.
1363 High-Performance and Parallel Computing with R latentnet Latent Position and Cluster Models for Statistical Networks Fit and simulate latent position and cluster models for statistical networks.
1364 High-Performance and Parallel Computing with R lga Tools for linear grouping analysis (LGA) Tools for linear grouping analysis. Three user-level functions: gap, rlga and lga.
1365 High-Performance and Parallel Computing with R Matching Multivariate and Propensity Score Matching with Balance Optimization Provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance has been obtained are also provided.
1366 High-Performance and Parallel Computing with R MonetDB.R Connect MonetDB to R Allows to pull data from MonetDB into R. Includes a DBI implementation and a dplyr backend.
1367 High-Performance and Parallel Computing with R mvnfast Fast Multivariate Normal and Student’s t Methods Provides computationally efficient tools related to the multivariate normal and Student’s t distributions. The main functionalities are: simulating multivariate random vectors, evaluating multivariate normal or Student’s t densities and Mahalanobis distances. These tools are very efficient thanks to the use of C++ code and of the OpenMP API.
1368 High-Performance and Parallel Computing with R nws R functions for NetWorkSpaces and Sleigh Provides coordination and parallel execution facilities, as well as limited cross-language data exchange, using the netWorkSpaces server developed by REvolution Computing
1369 High-Performance and Parallel Computing with R OpenCL Interface Allowing R to Use OpenCL Provides an interface to OpenCL, allowing R to leverage computing power of GPUs and other HPC accelerator devices.
1370 High-Performance and Parallel Computing with R orloca Operations Research LOCational Analysis Models Objects and methods to handle and solve the min-sum location problem, also known as Fermat-Weber problem. The min-sum location problem search for a point such that the weighted sum of the distances to the demand points are minimized. See “The Fermat-Weber location problem revisited” by Brimberg, Mathematical Programming, 1, pg. 71-76, 1995. <doi:10.1007/BF01592245>. General global optimization algorithms are used to solve the problem, along with the adhoc Weiszfeld method, see “Sur le point pour lequel la Somme des distances de n points donnes est minimum”, by Weiszfeld, Tohoku Mathematical Journal, First Series, 43, pg. 355-386, 1937 or “On the point for which the sum of the distances to n given points is minimum”, by E. Weiszfeld and F. Plastria, Annals of Operations Research, 167, pg. 7-41, 2009. <doi:10.1007/s10479-008-0352-z>.
1371 High-Performance and Parallel Computing with R parSim Parallel Simulation Studies Perform flexible simulation studies using one or multiple computer cores. The package is set up to be usable on high-performance clusters in addition to being run locally, see examples on <https://github.com/SachaEpskamp/parSim>.
1372 High-Performance and Parallel Computing with R partDSA Partitioning Using Deletion, Substitution, and Addition Moves A novel tool for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.
1373 High-Performance and Parallel Computing with R pbapply Adding Progress Bar to ’*apply’ Functions A lightweight package that adds progress bar to vectorized R functions (’*apply’). The implementation can easily be added to functions where showing the progress is useful (e.g. bootstrap). The type and style of the progress bar (with percentages or remaining time) can be set through options. Supports several parallel processing backends.
1374 High-Performance and Parallel Computing with R pbdBASE Programming with Big Data Base Wrappers for Distributed Matrices An interface to and extensions for the ‘PBLAS’ and ‘ScaLAPACK’ numerical libraries. This enables R to utilize distributed linear algebra for codes written in the ‘SPMD’ fashion. This interface is deliberately low-level and mimics the style of the native libraries it wraps. For a much higher level way of managing distributed matrices, see the ‘pbdDMAT’ package.
1375 High-Performance and Parallel Computing with R pbdDEMO Programming with Big Data Demonstrations and Examples Using ‘pbdR’ Packages A set of demos of ‘pbdR’ packages, together with a useful, unifying vignette.
1376 High-Performance and Parallel Computing with R pbdDMAT ‘pbdR’ Distributed Matrix Methods A set of classes for managing distributed matrices, and a collection of methods for computing linear algebra and statistics. Computation is handled mostly by routines from the ‘pbdBASE’ package, which itself relies on the ‘ScaLAPACK’ and ‘PBLAS’ numerical libraries for distributed computing.
1377 High-Performance and Parallel Computing with R pbdMPI Programming with Big Data Interface to MPI An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data (‘SPMD’) parallel programming style, which is intended for batch parallel execution.
1378 High-Performance and Parallel Computing with R pbdNCDF4 Programming with Big Data Interface to Parallel Unidata NetCDF4 Format Data Files This package adds collective parallel read and write capability to the R package ncdf4 version 1.8. Typical use is as a parallel NetCDF4 file reader in SPMD style programming. Each R process reads and writes its own data in a synchronized collective mode, resulting in faster parallel performance. Performance improvement is conditional on a parallel file system.
1379 High-Performance and Parallel Computing with R pbdPROF Programming with Big Data ― MPI Profiling Tools MPI profiling tools.
1380 High-Performance and Parallel Computing with R pbdSLAP Programming with Big Data Scalable Linear Algebra Packages Utilizing scalable linear algebra packages mainly including ‘BLACS’, ‘PBLAS’, and ‘ScaLAPACK’ in double precision via ‘pbdMPI’ based on ‘ScaLAPACK’ version 2.0.2.
1381 High-Performance and Parallel Computing with R peperr Parallelised Estimation of Prediction Error Designed for prediction error estimation through resampling techniques, possibly accelerated by parallel execution on a compute cluster. Newly developed model fitting routines can be easily incorporated.
1382 High-Performance and Parallel Computing with R permGPU Using GPUs in Statistical Genomics Can be used to carry out permutation resampling inference in the context of RNA microarray studies.
1383 High-Performance and Parallel Computing with R pls Partial Least Squares and Principal Component Regression Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).
1384 High-Performance and Parallel Computing with R pmclust Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs ‘pbdMPI’ to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through ‘pbdMPI’ and MPI’ implementations such as ‘OpenMPI’ and ‘MPICH’. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.
1385 High-Performance and Parallel Computing with R profr An Alternative Display for Profiling Information An alternative data structure and visual rendering for the profiling information generated by Rprof.
1386 High-Performance and Parallel Computing with R proftools Profile Output Processing Tools for R Tools for examining Rprof profile output.
1387 High-Performance and Parallel Computing with R pvclust Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling An implementation of multiscale bootstrap resampling for assessing the uncertainty in hierarchical cluster analysis. It provides AU (approximately unbiased) p-value as well as BP (bootstrap probability) value for each cluster in a dendrogram.
1388 High-Performance and Parallel Computing with R qsub Running Commands Remotely on ‘Gridengine’ Clusters Run lapply() calls in parallel by submitting them to ‘gridengine’ clusters using the ‘qsub’ command.
1389 High-Performance and Parallel Computing with R randomForestSRC Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) Fast OpenMP parallel computing of Breiman’s random forests for survival, competing risks, regression and classification based on Ishwaran and Kogalur’s popular random survival forests (RSF) package. Handles missing data and now includes multivariate, unsupervised forests, quantile regression and solutions for class imbalanced data. New fast interface using subsampling and confidence regions for variable importance.
1390 High-Performance and Parallel Computing with R Rborist Extensible, Parallelizable Implementation of the Random Forest Algorithm Scalable implementation of classification and regression forests, as described by Breiman (2001), <doi:10.1023/A:1010933404324>.
1391 High-Performance and Parallel Computing with R Rcpp Seamless R and C++ Integration The ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of third-party libraries. Documentation about ‘Rcpp’ is provided by several vignettes included in this package, via the ‘Rcpp Gallery’ site at <http://gallery.rcpp.org>, the paper by Eddelbuettel and Francois (2011, <doi:10.18637/jss.v040.i08>), the book by Eddelbuettel (2013, <doi:10.1007/978-1-4614-6868-4>) and the paper by Eddelbuettel and Balamuta (2018, <doi:10.1080/00031305.2017.1375990>); see ‘citation(“Rcpp”)’ for details.
1392 High-Performance and Parallel Computing with R RcppParallel Parallel Programming Tools for ‘Rcpp’ High level functions for parallel programming with ‘Rcpp’. For example, the ‘parallelFor()’ function can be used to convert the work of a standard serial “for” loop into a parallel one and the ‘parallelReduce()’ function can be used for accumulating aggregate or other values.
1393 High-Performance and Parallel Computing with R Rdsm Threads Environment for R Provides a threads-type programming environment for R. The package gives the R programmer the clearer, more concise shared memory world view, and in some cases gives superior performance as well. In addition, it enables parallel processing on very large, out-of-core matrices.
1394 High-Performance and Parallel Computing with R reticulate Interface to ‘Python’ Interface to ‘Python’ modules, classes, and functions. When calling into ‘Python’, R data types are automatically converted to their equivalent ‘Python’ types. When values are returned from ‘Python’ to R they are converted back to R types. Compatible with all versions of ‘Python’ >= 2.7.
1395 High-Performance and Parallel Computing with R rgenoud R Version of GENetic Optimization Using Derivatives A genetic algorithm plus derivative optimizer.
1396 High-Performance and Parallel Computing with R Rhpc Permits *apply() Style Dispatch for ‘HPC’ Function of apply style using ‘MPI’ provides better ‘HPC’ environment on R. And this package supports long vector, can deal with slightly big data.
1397 High-Performance and Parallel Computing with R RhpcBLASctl Control the Number of Threads on ‘BLAS’ Control the number of threads on ‘BLAS’ (Aka ‘GotoBLAS’, ‘OpenBLAS’, ‘ACML’, ‘BLIS’ and ‘MKL’). And possible to control the number of threads in ‘OpenMP’. Get a number of logical cores and physical cores if feasible.
1398 High-Performance and Parallel Computing with R RInside C++ Classes to Embed R in C++ Applications C++ classes to embed R in C++ applications A C++ class providing the R interpreter is offered by this package making it easier to have “R inside” your C++ application. As R itself is embedded into your application, a shared library build of R is required. This works on Linux, OS X and even on Windows provided you use the same tools used to build R itself. d Numerous examples are provided in the eight subdirectories of the examples/ directory of the installed package: standard, ‘mpi’ (for parallel computing), ‘qt’ (showing how to embed ‘RInside’ inside a Qt GUI application), ‘wt’ (showing how to build a “web-application” using the Wt toolkit), ‘armadillo’ (for ‘RInside’ use with ‘RcppArmadillo’) and ‘eigen’ (for ‘RInside’ use with ‘RcppEigen’). The examples use ‘GNUmakefile(s)’ with GNU extensions, so a GNU make is required (and will use the ‘GNUmakefile’ automatically). ‘Doxygen’-generated documentation of the C++ classes is available at the ‘RInside’ website as well.
1399 High-Performance and Parallel Computing with R rJava Low-Level R to Java Interface Low-level interface to Java VM very much like .C/.Call and friends. Allows creation of objects, calling methods and accessing fields.
1400 High-Performance and Parallel Computing with R rlecuyer R Interface to RNG with Multiple Streams Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L’Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.
1401 High-Performance and Parallel Computing with R Rmpi (core) Interface (Wrapper) to MPI (Message-Passing Interface) An interface (wrapper) to MPI. It also provides interactive R manager and worker environment.
1402 High-Performance and Parallel Computing with R RProtoBuf R Interface to the ‘Protocol Buffers’ ‘API’ (Version 2 or 3) Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal ‘RPC’ protocols and file formats. Additional documentation is available in two included vignettes one of which corresponds to our ‘JSS’ paper (2016, <doi:10.18637/jss.v071.i02>. Either version 2 or 3 of the ‘Protocol Buffers’ ‘API’ is supported.
1403 High-Performance and Parallel Computing with R rredis “Redis” Key/Value Database Client R client interface to the “Redis” key-value database.
1404 High-Performance and Parallel Computing with R rslurm Submit R Calculations to a Slurm Cluster Functions that simplify submitting R scripts to a Slurm workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.
1405 High-Performance and Parallel Computing with R rstream Streams of Random Numbers Unified object oriented interface for multiple independent streams of random numbers from different sources.
1406 High-Performance and Parallel Computing with R RxODE Facilities for Simulating from ODE-Based Models Facilities for running simulations from ordinary differential equation (ODE) models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the “R Administration and Installation” manual. Also the code is mostly released under GPL. The VODE and LSODA are in the public domain. The information is available in the inst/COPYRIGHTS.
1407 High-Performance and Parallel Computing with R Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
1408 High-Performance and Parallel Computing with R sitmo Parallel Pseudo Random Number Generator (PPRNG) ‘sitmo’ Header Files Provided within are two high quality and fast PPRNGs that may be used in an ‘OpenMP’ parallel environment. In addition, there is a generator for one dimensional low-discrepancy sequence. The objective of this library to consolidate the distribution of the ‘sitmo’ (C++98 & C++11), ‘threefry’ and ‘vandercorput’ (C++11-only) engines on CRAN by enabling others to link to the header files inside of ‘sitmo’ instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the ‘sitmo’ package and three accompanying vignette that provide additional information.
1409 High-Performance and Parallel Computing with R snow (core) Simple Network of Workstations Support for simple parallel computing in R.
1410 High-Performance and Parallel Computing with R snowfall Easier cluster computing (based on snow) Usability wrapper around snow for easier development of parallel R programs. This package offers e.g. extended error checks, and additional functions. All functions work in sequential mode, too, if no cluster is present or wished. Package is also designed as connector to the cluster management tool sfCluster, but can also used without it.
1411 High-Performance and Parallel Computing with R snowFT Fault Tolerant Simple Network of Workstations Extension of the snow package supporting fault tolerant and reproducible applications, as well as supporting easy-to-use parallel programming - only one function is needed. Dynamic cluster size is also available.
1412 High-Performance and Parallel Computing with R speedglm Fitting Linear and Generalized Linear Models to Large Data Sets Fitting linear models and generalized linear models to large data sets by updating algorithms.
1413 High-Performance and Parallel Computing with R sqldf Manipulate R Data Frames Using SQL The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. ‘RSQLite’, ‘RH2’, ‘RMySQL’ and ‘RPostgreSQL’ backends are supported.
1414 High-Performance and Parallel Computing with R ssgraph Bayesian Graphical Estimation using Spike-and-Slab Priors Bayesian estimation for undirected graphical models using spike-and-slab priors. The package handles continuous, discrete, and mixed data. To speed up the computations, the computationally intensive tasks of the package are implemented in C++ in parallel using OpenMP.
1415 High-Performance and Parallel Computing with R STAR Spike Train Analysis with R Functions to analyze neuronal spike trains from a single neuron or from several neurons recorded simultaneously.
1416 High-Performance and Parallel Computing with R tensorflow R Interface to ‘TensorFlow’ Interface to ‘TensorFlow’ <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
1417 High-Performance and Parallel Computing with R tfestimators Interface to ‘TensorFlow’ Estimators Interface to ‘TensorFlow’ Estimators <https://www.tensorflow.org/programmers_guide/estimators>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
1418 High-Performance and Parallel Computing with R tm Text Mining Package A framework for text mining applications within R.
1419 High-Performance and Parallel Computing with R varSelRF Variable Selection using Random Forests Variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). Main applications in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).
1420 High-Performance and Parallel Computing with R xgboost Extreme Gradient Boosting Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
1421 Hydrological Data and Modeling airGR Suite of GR Hydrological Models for Precipitation-Runoff Modelling Hydrological modelling tools developed at Irstea-Antony (HYCAR Research Unit, France). The package includes several conceptual rainfall-runoff models (GR4H, GR4J, GR5J, GR6J, GR2M, GR1A), a snow accumulation and melt model (CemaNeige) and the associated functions for their calibration and evaluation. Use help(airGR) for package description and references.
1422 Hydrological Data and Modeling airGRteaching Teaching Hydrological Modelling with the GR Rainfall-Runoff Models (‘Shiny’ Interface Included) Add-on package to the ‘airGR’ package that simplifies its use and is aimed at being used for teaching hydrology. The package provides 1) three functions that allow to complete very simply a hydrological modelling exercise 2) plotting functions to help students to explore observed data and to interpret the results of calibration and simulation of the GR (‘Genie rural’) models 3) a ‘Shiny’ graphical interface that allows for displaying the impact of model parameters on hydrographs and models internal variables.
1423 Hydrological Data and Modeling berryFunctions Function Collection Related to Plotting and Hydrology Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.
1424 Hydrological Data and Modeling bigleaf Physical and Physiological Ecosystem Properties from Eddy Covariance Data Calculation of physical (e.g. aerodynamic conductance, surface temperature), and physiological (e.g. canopy conductance, water-use efficiency) ecosystem properties from eddy covariance data and accompanying meteorological measurements. Calculations assume the land surface to behave like a ‘big-leaf’ and return bulk ecosystem/canopy variables.
1425 Hydrological Data and Modeling biotic Calculation of Freshwater Biotic Indices Calculates a range of UK freshwater invertebrate biotic indices including BMWP, Whalley, WHPT, Habitat-specific BMWP, AWIC, LIFE and PSI.
1426 Hydrological Data and Modeling bomrang Australian Government Bureau of Meteorology (‘BOM’) Data Client Provides functions to interface with Australian Government Bureau of Meteorology (‘BOM’) data, fetching data and returning a tidy data frame of precis forecasts, historical and current weather data from stations, agriculture bulletin data, ‘BOM’ 0900 or 1500 weather bulletins and downloading and importing radar and satellite imagery files. Data (c) Australian Government Bureau of Meteorology Creative Commons (CC) Attribution 3.0 licence or Public Access Licence (PAL) as appropriate. See <http://www.bom.gov.au/other/copyright.shtml> for further details.
1427 Hydrological Data and Modeling boussinesq Analytic Solutions for (ground-water) Boussinesq Equation This package is a collection of R functions implemented from published and available analytic solutions for the One-Dimensional Boussinesq Equation (ground-water). In particular, the function “beq.lin” is the analytic solution of the linearized form of Boussinesq Equation between two different head-based boundary (Dirichlet) conditions; “beq.song” is the non-linear power-series analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.
1428 Hydrological Data and Modeling CityWaterBalance Track Flows of Water Through an Urban System Retrieves data and estimates unmeasured flows of water through the urban network. Any city may be modeled with preassembled data, but data for US cities can be gathered via web services using this package and dependencies ‘geoknife’ and ‘dataRetrieval’.
1429 Hydrological Data and Modeling clifro Easily Download and Visualise Climate Data from CliFlo CliFlo is a web portal to the New Zealand National Climate Database and provides public access (via subscription) to around 6,500 various climate stations (see <https://cliflo.niwa.co.nz/> for more information). Collating and manipulating data from CliFlo (hence clifro) and importing into R for further analysis, exploration and visualisation is now straightforward and coherent. The user is required to have an internet connection, and a current CliFlo subscription (free) if data from stations, other than the public Reefton electronic weather station, is sought.
1430 Hydrological Data and Modeling climatol Climate Tools (Series Homogenization and Derived Products) Functions for the quality control, homogenization and missing data infilling of climatological series and to obtain climatological summaries and grids from the results. Also functions to draw wind-roses and Walter&Lieth climate diagrams.
1431 Hydrological Data and Modeling climdex.pcic PCIC Implementation of Climdex Routines PCIC’s implementation of Climdex routines for computation of extreme climate indices.
1432 Hydrological Data and Modeling countyweather Compiles Meterological Data for U.S. Counties Interacts with NOAA data sources (including the NCDC API at <http://www.ncdc.noaa.gov/cdo-web/webservices/v2> and ISD data) using functions from the ‘rnoaa’ package to obtain and compile weather time series for U.S. counties. This work was supported in part by grants from the National Institute of Environmental Health Sciences (R00ES022631) and the Colorado State University Water Center.
1433 Hydrological Data and Modeling dataRetrieval Retrieval Functions for USGS and EPA Hydrologic and Water Quality Data Collection of functions to help retrieve U.S. Geological Survey (USGS) and U.S. Environmental Protection Agency (EPA) water quality and hydrology data from web services. USGS web services are discovered from National Water Information System (NWIS) <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Both EPA and USGS water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.
1434 Hydrological Data and Modeling dbhydroR ‘DBHYDRO’ Hydrologic and Water Quality Data Client for programmatic access to the South Florida Water Management District’s ‘DBHYDRO’ database at <https://www.sfwmd.gov/science-data/dbhydro>, with functions for accessing hydrologic and water quality data.
1435 Hydrological Data and Modeling driftR Drift Correcting Water Quality Data A tidy implementation of equations that correct for instrumental drift in continuous water quality monitoring data. There are many sources of water quality data including private (ex: YSI instruments) and open source (ex: USGS and NDBC), each of which are susceptible to errors/inaccuracies due to drift. This package allows the user to correct their data using one or two standard reference values in a uniform, reproducible way. The equations implemented are from Hasenmueller (2011) <doi:10.7936/K7N014KS>.
1436 Hydrological Data and Modeling dynatopmodel Implementation of the Dynamic TOPMODEL Hydrological Model A native R implementation and enhancement of the Dynamic TOPMODEL semi-distributed hydrological model. Includes some preprocessing, utility and routines for displaying outputs.
1437 Hydrological Data and Modeling Ecohydmod Ecohydrological Modelling Simulates the soil water balance (soil moisture, evapotranspiration, leakage and runoff), rainfall series by using the marked Poisson process and the vegetation growth through the normalized difference vegetation index (NDVI). Please see Souza et al. (2016) <doi:10.1002/hyp.10953>.
1438 Hydrological Data and Modeling EcoHydRology (core) A Community Modeling Foundation for Eco-Hydrology Provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex eco-hydrological interactions.
1439 Hydrological Data and Modeling ecoval Procedures for Ecological Assessment of Surface Waters Functions for evaluating and visualizing ecological assessment procedures for surface waters containing physical, chemical and biological assessments in the form of value functions.
1440 Hydrological Data and Modeling EGRET Exploration and Graphics for RivEr Trends Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS). The modeling method is introduced and discussed in Hirsch et al. (2010) <doi:10.1111/j.1752-1688.2010.00482.x>, and expanded in Hirsch and De Cicco (2015) <doi:10.3133/tm4A10>.
1441 Hydrological Data and Modeling EGRETci Exploration and Graphics for RivEr Trends Confidence Intervals Collection of functions to evaluate uncertainty of results from water quality analysis using the Weighted Regressions on Time Discharge and Season (WRTDS) method. This package is an add-on to the EGRET package that performs the WRTDS analysis. The WRTDS modeling method was initially introduced and discussed in Hirsch et al. (2010) <doi:10.1111/j.1752-1688.2010.00482.x>, and expanded in Hirsch and De Cicco (2015) <doi:10.3133/tm4A10>. The paper describing the uncertainty and confidence interval calculations is Hirsch et al. (2015) <doi:10.1016/j.envsoft.2015.07.017>.
1442 Hydrological Data and Modeling Evapotranspiration Modelling Actual, Potential and Reference Crop Evapotranspiration Uses data and constants to calculate potential evapotranspiration (PET) and actual evapotranspiration (AET) from 21 different formulations including Penman, Penman-Monteith FAO 56, Priestley-Taylor and Morton formulations.
1443 Hydrological Data and Modeling FAdist Distributions that are Sometimes Used in Hydrology Probability distributions that are sometimes useful in hydrology.
1444 Hydrological Data and Modeling FlowScreen Daily Streamflow Trend and Change Point Screening Screens daily streamflow time series for temporal trends and change-points. This package has been primarily developed for assessing the quality of daily streamflow time series. It also contains tools for plotting and calculating many different streamflow metrics. The package can be used to produce summary screening plots showing change-points and significant temporal trends for high flow, low flow, and/or baseflow statistics, or it can be used to perform more detailed hydrological time series analyses. The package was designed for screening daily streamflow time series from Water Survey Canada and the United States Geological Survey but will also work with streamflow time series from many other agencies.
1445 Hydrological Data and Modeling geoknife Web-Processing of Large Gridded Datasets Processes gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers.
1446 Hydrological Data and Modeling geotopbricks An R Plug-in for the Distributed Hydrological Model GEOtop It analyzes raster maps and other information as input/output files from the Hydrological Distributed Model GEOtop. It contains functions and methods to import maps and other keywords from geotop.inpts file. Some examples with simulation cases of GEOtop 2.x/3.x are presented in the package. Any information about the GEOtop Distributed Hydrological Model source code is available on www.geotop.org. Technical details about the model are available in Endrizzi et al, 2014 (<http://www.geosci-model-dev.net/7/2831/2014/gmd-7-2831-2014.html>).
1447 Hydrological Data and Modeling getMet Get Meteorological Data for Hydrologic Models Hydrologic models often require users to collect and format input meteorological data. This package contains functions for sourcing, formatting, and editing meteorological data for hydrologic models.
1448 Hydrological Data and Modeling GSODR Global Surface Summary of the Day (‘GSOD’) Weather Data Client Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day (‘GSOD’) weather data from the from the USA National Centers for Environmental Information (‘NCEI’) for use in R. Units are converted from from United States Customary System (‘USCS’) units to International System of Units (‘SI’). Stations may be individually checked for number of missing days defined by the user, where stations with too many missing observations are omitted. Only stations with valid reported latitude and longitude values are permitted in the final data. Additional useful elements, saturation vapour pressure (‘es’), actual vapour pressure (‘ea’) and relative humidity are calculated from the original data and included in the final data set. The resulting data include station identification information, state, country, latitude, longitude, elevation, weather observations and associated flags. Additional data are included with this R package: a list of elevation values for stations between -60 and 60 degrees latitude derived from the Shuttle Radar Topography Measuring Mission (‘SRTM’). For information on the ‘GSOD’ data from ‘NCEI’, please see the ‘GSOD’ ‘readme.txt’ file available from, <http://www1.ncdc.noaa.gov/pub/data/gsod/readme.txt>.
1449 Hydrological Data and Modeling GWSDAT GroundWater Spatiotemporal Data Analysis Tool (GWSDAT) Shiny application for the analysis of groundwater monitoring data, designed to work with simple time-series data for solute concentration and ground water elevation, but can also plot non-aqueous phase liquid (NAPL) thickness if required. Also provides the import of a site basemap in GIS shapefile format.
1450 Hydrological Data and Modeling hddtools Hydrological Data Discovery Tools Facilitates discovery and handling of hydrological data, access to catalogues and databases.
1451 Hydrological Data and Modeling humidity Calculate Water Vapor Measures from Temperature and Dew Point Vapor pressure, relative humidity, absolute humidity, specific humidity, and mixing ratio are commonly used water vapor measures in meteorology. This R package provides functions for calculating saturation vapor pressure (hPa), partial water vapor pressure (Pa), relative humidity (%), absolute humidity (kg/m^3), specific humidity (kg/kg), and mixing ratio (kg/kg) from temperature (K) and dew point (K). Conversion functions between humidity measures are also provided.
1452 Hydrological Data and Modeling hydroApps Tools and models for hydrological applications Package providing tools for hydrological applications and models developed for regional analysis in Northwestern Italy.
1453 Hydrological Data and Modeling hydrogeo Groundwater Data Presentation and Interpretation Contains one function for drawing Piper diagrams (also called Piper-Hill diagrams) of water analyses for major ions.
1454 Hydrological Data and Modeling hydroGOF (core) Goodness-of-Fit Functions for Comparison of Simulated and Observed Hydrological Time Series S3 functions implementing both statistical and graphical goodness-of-fit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models. Missing values in observed and/or simulated values can be removed before computations. Comments / questions / collaboration of any kind are very welcomed.
1455 Hydrological Data and Modeling hydrolinks Hydrologic Network Linking Data and Tools Tools to link geographic data with hydrologic network, including lakes, streams and rivers. Includes automated download of U.S. National Hydrography Network and other hydrolayers.
1456 Hydrological Data and Modeling HydroMe R codes for estimating water retention and infiltration model parameters using experimental data This package is version 2 of HydroMe v.1 package. It estimates the parameters in infiltration and water retention models by curve-fitting method. The models considered are those that are commonly used in soil science. It has new models for water retention characteristic curve and debugging of errors in HydroMe v.1
1457 Hydrological Data and Modeling hydroPSO Particle Swarm Optimisation, with Focus on Environmental Models State-of-the-art version of the Particle Swarm Optimisation (PSO) algorithm (SPSO-2011 and SPSO-2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of non-smooth and non-linear functions. However, the main focus of hydroPSO is the calibration of environmental and other real-world models that need to be executed from the system console. hydroPSO is model-independent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to fine-tune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with user-friendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallel-capable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian). See Zambrano-Bigiarini and Rojas (2013) <doi:10.1016/j.envsoft.2013.01.004> for more details.
1458 Hydrological Data and Modeling hydroscoper Interface to the Greek National Data Bank for Hydrometeorological Information R interface to the Greek National Data Bank for Hydrological and Meteorological Information <http://www.hydroscope.gr/>. It covers Hydroscope’s data sources and provides functions to transliterate, translate and download them into tidy dataframes (tibbles).
1459 Hydrological Data and Modeling hydrostats Hydrologic Indices for Daily Time Series Data Calculates a suite of hydrologic indices for daily time series data that are widely used in hydrology and stream ecology.
1460 Hydrological Data and Modeling hydroTSM (core) Time Series Management, Analysis and Interpolation for Hydrological Modelling S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.
1461 Hydrological Data and Modeling hyfo Hydrology and Climate Forecasting Focuses on data processing and visualization in hydrology and climate forecasting. Main function includes data extraction, data downscaling, data resampling, gap filler of precipitation, bias correction of forecasting data, flexible time series plot, and spatial map generation. It is a good pre- processing and post-processing tool for hydrological and hydraulic modellers.
1462 Hydrological Data and Modeling IDF Estimation and Plotting of IDF Curves Intensity-duration-frequency (IDF) curves are a widely used analysis-tool in hydrology to assess extreme values of precipitation [e.g. Mailhot et al., 2007, <doi:10.1016/j.jhydrol.2007.09.019>]. The package ‘IDF’ provides a function to read precipitation data from German weather service (DWD) ‘webwerdis’ <http://www.dwd.de/EN/ourservices/webwerdis/webwerdis.html> files and Berlin station data from ‘Stadtmessnetz’ <http://www.geo.fu-berlin.de/en/met/service/stadtmessnetz/index.html> files, and additionally IDF parameters can be estimated also from a given data.frame containing a precipitation time series. The data is aggregated to given levels yearly intensity maxima are calculated either for the whole year or given months. From these intensity maxima IDF parameters are estimated on the basis of a duration-dependent generalised extreme value distribution [Koutsoyannis et al., 1998, <doi:10.1016/S0022-1694(98)00097-3>]. IDF curves based on these estimated parameters can be plotted.
1463 Hydrological Data and Modeling kitagawa Spectral Response of Water Wells to Harmonic Strain and Pressure Signals Provides tools to calculate the theoretical hydrodynamic response of an aquifer undergoing harmonic straining or pressurization, or analyze measured responses. There are two classes of models here: (1) for sealed wells, based on the model of Kitagawa et al (2011, <doi:10.1029/2010JB007794>), and (2) for open wells, based on the models of Cooper et al (1965, <doi:10.1029/JZ070i016p03915>), Hsieh et al (1987, <doi:10.1029/WR023i010p01824>), Rojstaczer (1988, <doi:10.1029/JB093iB11p13619>), and Liu et al (1989, <doi:10.1029/JB094iB07p09453>). These models treat strain (or aquifer head) as an input to the physical system, and fluid-pressure (or water height) as the output. The applicable frequency band of these models is characteristic of seismic waves, atmospheric pressure fluctuations, and solid earth tides.
1464 Hydrological Data and Modeling kiwisR A Wrapper for Querying KISTERS ‘WISKI’ Databases via the ‘KiWIS’ API A wrapper for querying ‘WISKI’ databases via the ‘KiWIS’ ‘REST’ API. ‘WISKI’ is an ‘SQL’ relational database used for the collection and storage of water data developed by KISTERS and ‘KiWIS’ is a ‘REST’ service that provides access to ‘WISKI’ databases via HTTP requests (<https://water.kisters.de/en/technology-trends/kisters-and-open-data/>). Contains a list of default databases (called ‘hubs’) and also allows users to provide their own ‘KiWIS’ URL. Supports the entire query process- from metadata to specific time series values. All data is returned as tidy tibbles.
1465 Hydrological Data and Modeling kwb.hantush Calculation of Groundwater Mounding Beneath an Infiltration Basin Calculation groundwater mounding beneath an infiltration basin based on the Hantush (1967) equation (http://doi.org/10.1029/WR003i001p00227). The correct implementation is shown with a verification example based on a USGS report (page 25, http://pubs.usgs.gov/sir/2010/5102/support/sir2010-5102.pdf).
1466 Hydrological Data and Modeling lakemorpho Lake Morphometry Metrics Lake morphometry metrics are used by limnologists to understand, among other things, the ecological processes in a lake. Traditionally, these metrics are calculated by hand, with planimeters, and increasingly with commercial GIS products. All of these methods work; however, they are either outdated, difficult to reproduce, or require expensive licenses to use. The ‘lakemorpho’ package provides the tools to calculate a typical suite of these metrics from an input elevation model and lake polygon. The metrics currently supported are: fetch, major axis, minor axis, major/minor axis ratio, maximum length, maximum width, mean width, maximum depth, mean depth, shoreline development, shoreline length, surface area, and volume.
1467 Hydrological Data and Modeling lfstat Calculation of Low Flow Statistics for Daily Stream Flow Data The “Manual on Low-flow Estimation and Prediction”, published by the World Meteorological Organisation (WMO), gives a comprehensive summary on how to analyse stream flow data focusing on low-flows. This packages provides functions to compute the described statistics and produces plots similar to the ones in the manual.
1468 Hydrological Data and Modeling LPM Linear Parametric Models Applied to Hydrological Series Apply Univariate Long Memory Models, Apply Multivariate Short Memory Models To Hydrological Dataset, Estimate Intensity Duration Frequency curve to rainfall series.
1469 Hydrological Data and Modeling lulcc Land Use Change Modelling in R Classes and methods for spatially explicit land use change modelling in R.
1470 Hydrological Data and Modeling MBC Multivariate Bias Correction of Climate Model Outputs Calibrate and apply multivariate bias correction algorithms for climate model simulations of multiple climate variables. Three methods described by Cannon (2016) <doi:10.1175/JCLI-D-15-0679.1> and Cannon (2018) <doi:10.1007/s00382-017-3580-6> are implemented: (i) MBC Pearson correlation (MBCp), (ii) MBC rank correlation (MBCr), and (iii) MBC N-dimensional PDF transform (MBCn).
1471 Hydrological Data and Modeling meteo Spatio-Temporal Analysis and Mapping of Meteorological Observations Spatio-temporal geostatistical mapping of meteorological data. Global spatio-temporal models calculated using publicly available data are stored in package.
1472 Hydrological Data and Modeling meteoland Landscape Meteorology Tools Functions to estimate weather variables at any position of a landscape [De Caceres et al. (2018) <doi:10.1016/j.envsoft.2018.08.003>].
1473 Hydrological Data and Modeling MODISTools Interface to the ‘MODIS Land Products Subsets’ Web Services Programmatic interface to the ‘MODIS Land Products Subsets’ web services (<https://modis.ornl.gov/data/modis_webservice.html>). Allows for easy downloads of ‘MODIS’ time series directly to your R workspace or your computer.
1474 Hydrological Data and Modeling MODIStsp A Tool for Automating Download and Preprocessing of MODIS Land Products Data Allows automating the creation of time series of rasters derived from MODIS Satellite Land Products data. It performs several typical preprocessing steps such as download, mosaicking, reprojection and resize of data acquired on a specified time period. All processing parameters can be set using a user-friendly GUI. Users can select which layers of the original MODIS HDF files they want to process, which additional Quality Indicators should be extracted from aggregated MODIS Quality Assurance layers and, in the case of Surface Reflectance products , which Spectral Indexes should be computed from the original reflectance bands. For each output layer, outputs are saved as single-band raster files corresponding to each available acquisition date. Virtual files allowing access to the entire time series as a single file are also created. Command-line execution exploiting a previously saved processing options file is also possible, allowing to automatically update time series related to a MODIS product whenever a new image is available.
1475 Hydrological Data and Modeling musica Multiscale Climate Model Assessment Provides functions allowing for (1) easy aggregation of multivariate time series into custom time scales, (2) comparison of statistical summaries between different data sets at multiple time scales (e.g. observed and bias-corrected data), (3) comparison of relations between variables and/or different data sets at multiple time scales (e.g. correlation of precipitation and temperature in control and scenario simulation) and (4) transformation of time series at custom time scales.
1476 Hydrological Data and Modeling nhdR Tools for working with the National Hydrography Dataset Tools for working with the National Hydrography Dataset, with functions for querying, downloading, and networking both the NHD <https://www.usgs.gov/core-science-systems/ngp/national-hydrography> and NHDPlus <http://www.horizon-systems.com/nhdplus> datasets.
1477 Hydrological Data and Modeling nsRFA Non-Supervised Regional Frequency Analysis A collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the index-value method and, more precisely, helps the hydrologist to: (1) regionalize the index-value; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available.
1478 Hydrological Data and Modeling qmap Statistical Transformations for Post-Processing Climate Model Output Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.
1479 Hydrological Data and Modeling rdwd Select and Download Climate Data from ‘DWD’ (German Weather Service) Handle climate data from the ‘DWD’ (‘Deutscher Wetterdienst’, see <https://www.dwd.de/EN/climate_environment/cdc/cdc.html> for more information). Choose files with ‘selectDWD()’, download and process data sets with ‘dataDWD()’ and ‘readDWD()’.
1480 Hydrological Data and Modeling reservoir Tools for Analysis, Design, and Operation of Water Supply Storages Measure single-storage water supply system performance using resilience, reliability, and vulnerability metrics; assess storage-yield-reliability relationships; determine no-fail storage with sequent peak analysis; optimize release decisions for water supply, hydropower, and multi-objective reservoirs using deterministic and stochastic dynamic programming; generate inflow replicates using parametric and non-parametric models; evaluate inflow persistence using the Hurst coefficient.
1481 Hydrological Data and Modeling RHMS Hydrologic Modelling System for R Users Hydrologic modelling system is an object oriented tool which enables R users to simulate and analyze hydrologic events. The package proposes functions and methods for construction, simulation, visualization, and calibration of hydrologic systems.
1482 Hydrological Data and Modeling RMAWGEN Multi-Site Auto-Regressive Weather GENerator S3 and S4 functions are implemented for spatial multi-site stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.
1483 Hydrological Data and Modeling RNCEP Obtain, Organize, and Visualize NCEP Weather Data Contains functions to retrieve, organize, and visualize weather data from the NCEP/NCAR Reanalysis (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html) and NCEP/DOE Reanalysis II (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html) datasets. Data are queried via the Internet and may be obtained for a specified spatial and temporal extent or interpolated to a point in space and time. We also provide functions to visualize these weather data on a map. There are also functions to simulate flight trajectories according to specified behavior using either NCEP wind data or data specified by the user.
1484 Hydrological Data and Modeling rnoaa ‘NOAA’ Weather Data from R Client for many ‘NOAA’ data sources including the ‘NCDC’ climate ‘API’ at <https://www.ncdc.noaa.gov/cdo-web/webservices/v2>, with functions for each of the ‘API’ ‘endpoints’: data, data categories, data sets, data types, locations, location categories, and stations. In addition, we have an interface for ‘NOAA’ sea ice data, the ‘NOAA’ severe weather inventory, ‘NOAA’ Historical Observing ‘Metadata’ Repository (‘HOMR’) data, ‘NOAA’ storm data via ‘IBTrACS’, tornado data via the ‘NOAA’ storm prediction center, and more.
1485 Hydrological Data and Modeling rnrfa UK National River Flow Archive Data from R Utility functions to retrieve data from the UK National River Flow Archive (<http://nrfa.ceh.ac.uk/>). The package contains R wrappers to the UK NRFA data temporary-API. There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information.
1486 Hydrological Data and Modeling rpdo Pacific Decadal Oscillation Index Data Monthly Pacific Decadal Oscillation (PDO) index values from January 1900 to present.
1487 Hydrological Data and Modeling RSAlgaeR Builds Empirical Remote Sensing Models of Water Quality Variables and Analyzes Long-Term Trends Assists in processing reflectance data, developing empirical models using stepwise regression and a generalized linear modeling approach, cross- validation, and analysis of trends in water quality conditions (specifically chl-a) and climate conditions using the Theil-Sen estimator.
1488 Hydrological Data and Modeling rsoi Import Various Northern and Southern Hemisphere Climate Indices Downloads Southern Oscillation Index, Oceanic Nino Index, North Pacific Gyre Oscillation data, North Atlantic Oscillation and Arctic Oscillation. Data sources are described in the README file.
1489 Hydrological Data and Modeling rtop Interpolation of Data with Variable Spatial Support Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.
1490 Hydrological Data and Modeling rwunderground R Interface to Weather Underground API Tools for getting historical weather information and forecasts from wunderground.com. Historical weather and forecast data includes, but is not limited to, temperature, humidity, windchill, wind speed, dew point, heat index. Additionally, the weather underground weather API also includes information on sunrise/sunset, tidal conditions, satellite/webcam imagery, weather alerts, hurricane alerts and historical high/low temperatures.
1491 Hydrological Data and Modeling sbtools USGS ScienceBase Tools Tools for interacting with U.S. Geological Survey ScienceBase <https://www.sciencebase.gov> interfaces. ScienceBase is a data cataloging and collaborative data management platform. Functions included for querying ScienceBase, and creating and fetching datasets.
1492 Hydrological Data and Modeling SCI Standardized Climate Indices Such as SPI, SRI or SPEI Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).
1493 Hydrological Data and Modeling smapr Acquisition and Processing of NASA Soil Moisture Active-Passive (SMAP) Data Facilitates programmatic access to NASA Soil Moisture Active Passive (SMAP) data with R. It includes functions to search for, acquire, and extract SMAP data.
1494 Hydrological Data and Modeling soilwater Implementation of Parametric Formulas for Soil Water Retention or Conductivity Curve It implements parametric formulas of soil water retention or conductivity curve. At the moment, only Van Genuchten (for soil water retention curve) and Mualem (for hydraulic conductivity) were implemented. See reference (<http://en.wikipedia.org/wiki/Water_retention_curve>).
1495 Hydrological Data and Modeling SPEI Calculation of the Standardised Precipitation-Evapotranspiration Index A set of functions for computing potential evapotranspiration and several widely used drought indices including the Standardized Precipitation-Evapotranspiration Index (SPEI).
1496 Hydrological Data and Modeling SWATmodel A multi-OS implementation of the TAMU SWAT model The Soil and Water Assessment Tool is a river basin or watershed scale model developed by Dr. Jeff Arnold for the USDA-ARS.
1497 Hydrological Data and Modeling swmmr R Interface for US EPA’s SWMM Functions to connect the widely used Storm Water Management Model (SWMM) of the United States Environmental Protection Agency (US EPA) <https://www.epa.gov/water-research/storm-water-management-model-swmm> to R with currently two main goals: (1) Run a SWMM simulation from R and (2) provide fast access to simulation results, i.e. SWMM’s binary ‘.out’-files. High performance is achieved with help of Rcpp. Additionally, reading SWMM’s ‘.inp’ and ‘.rpt’ files is supported to glance model structures and to get direct access to simulation summaries.
1498 Hydrological Data and Modeling tidyhydat Extract and Tidy Canadian ‘Hydrometric’ Data Provides functions to access historical and real-time national ‘hydrometric’ data from Water Survey of Canada data sources (<http://dd.weather.gc.ca/hydrometric/csv/> and <http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.
1499 Hydrological Data and Modeling topmodel Implementation of the Hydrological Model TOPMODEL in R Set of hydrological functions including an R implementation of the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode.
1500 Hydrological Data and Modeling TUWmodel Lumped Hydrological Model for Education Purposes The model, developed at the Vienna University of Technology, is a lumped conceptual rainfall-runoff model, following the structure of the HBV model. The model runs on a daily or shorter time step and consists of a snow routine, a soil moisture routine and a flow routing routine. See Parajka, J., R. Merz, G. Bloeschl (2007) <doi:10.1002/hyp.6253> Uncertainty and multiple objective calibration in regional water balance modelling: case study in 320 Austrian catchments, Hydrological Processes, 21, 435-446.
1501 Hydrological Data and Modeling washdata Urban Water and Sanitation Survey Dataset Urban water and sanitation survey dataset collected by Water and Sanitation for the Urban Poor (WSUP) with technical support from Valid International. These citywide surveys have been collecting data allowing water and sanitation service levels across the entire city to be characterised, while also allowing more detailed data to be collected in areas of the city of particular interest. These surveys are intended to generate useful information for others working in the water and sanitation sector. Current release version includes datasets collected from a survey conducted in Dhaka, Bangladesh in March 2017. This survey in Dhaka is one of a series of surveys to be conducted by WSUP in various cities in which they operate including Accra, Ghana; Nakuru, Kenya; Antananarivo, Madagascar; Maputo, Mozambique; and, Lusaka, Zambia. This package will be updated once the surveys in other cities are completed and datasets have been made available.
1502 Hydrological Data and Modeling wasim Visualisation and analysis of output files of the hydrological model WASIM Helpful tools for data processing and visualisation of results of the hydrological model WASIM-ETH.
1503 Hydrological Data and Modeling water Actual Evapotranspiration with Energy Balance Models Tools and functions to calculate actual Evapotranspiration using surface energy balance models.
1504 Hydrological Data and Modeling waterData Retrieval, Analysis, and Anomaly Calculation of Daily Hydrologic Time Series Data Imports U.S. Geological Survey (USGS) daily hydrologic data from USGS web services (see <https://waterservices.usgs.gov/> for more information), plots the data, addresses some common data problems, and calculates and plots anomalies.
1505 Hydrological Data and Modeling WaterML Fetch and Analyze Data from ‘WaterML’ and ‘WaterOneFlow’ Web Services Lets you connect to any of the Consortium of Universities for the Advancement of Hydrologic Sciences, Inc. (‘CUAHSI’) Water Data Center ‘WaterOneFlow’ web services and read any ‘WaterML’ hydrological time series data file. To see list of available web services, see <http://hiscentral.cuahsi.org>. All versions of ‘WaterML’ (1.0, 1.1 and 2.0) and both types of the web service protocol (‘SOAP’ and ‘REST’) are supported. The package has six data download functions: GetServices(): show all public web services from the HIS Central Catalog. HISCentral_GetSites() and HISCentral_GetSeriesCatalog(): search for sites or time series from the HIS Central catalog based on geographic bounding box, server, or keyword. GetVariables(): Show a data.frame with all variables on the server. GetSites(): Show a data.frame with all sites on the server. GetSiteInfo(): Show what variables, methods and quality control levels are available at the specific site. GetValues(): Given a site code, variable code, start time and end time, fetch a data.frame of all the observation time series data values. The GetValues() function can also parse ‘WaterML’ data from a custom URL or from a local file. The package also has five data upload functions: AddSites(), AddVariables(), AddMethods(), AddSources(), and AddValues(). These functions can be used for uploading data to a ‘HydroServer Lite’ Observations Data Model (‘ODM’) database via the ‘JSON’ data upload web service interface.
1506 Hydrological Data and Modeling Watersheds Spatial Watershed Aggregation and Spatial Drainage Network Analysis Methods for watersheds aggregation and spatial drainage network analysis.
1507 Hydrological Data and Modeling weathercan Download Weather Data from the Environment and Climate Change Canada Website Provides means for downloading historical weather data from the Environment and Climate Change Canada website (<http://climate.weather.gc.ca/historical_data/search_historic_data_e.html>). Data can be downloaded from multiple stations and over large date ranges and automatically processed into a single dataset. Tools are also provided to identify stations either by name or proximity to a location.
1508 Hydrological Data and Modeling worldmet Import Surface Meteorological Data from NOAA Integrated Surface Database (ISD) Functions to import data from more than 30,000 surface meteorological sites around the world managed by the National Oceanic and Atmospheric Administration (NOAA) Integrated Surface Database (ISD, see <https://www.ncdc.noaa.gov/isd>).
1509 Hydrological Data and Modeling wql Exploring Water Quality Monitoring Data Functions to assist in the processing and exploration of data from environmental monitoring programs. The package name stands for “water quality” and reflects the original focus on time series data for physical and chemical properties of water, as well as the biota. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similar-frequency time series regardless of the subject matter.
1510 Hydrological Data and Modeling WRSS Water Resources System Simulator Water resources system simulator is a tool for simulation and analysis of large-scale water resources systems. ‘WRSS’ proposes functions and methods for construction, simulation and analysis of primary storage and hydropower water resources features (e.g. reservoirs, aquifers, and etc.) based on Standard Operating Policy (SOP).
1511 Hydrological Data and Modeling WRTDStidal Weighted Regression for Water Quality Evaluation in Tidal Waters An adaptation for estuaries (tidal waters) of weighted regression on time, discharge, and season to evaluate trends in water quality time series.
1512 Machine Learning & Statistical Learning ahaz Regularization for semiparametric additive hazards regression Computationally efficient procedures for regularized estimation with the semiparametric additive hazards regression model.
1513 Machine Learning & Statistical Learning arules Mining Association Rules and Frequent Itemsets Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat.
1514 Machine Learning & Statistical Learning BART Bayesian Additive Regression Trees Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09-AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.
1515 Machine Learning & Statistical Learning bartMachine Bayesian Additive Regression Trees An advanced implementation of Bayesian Additive Regression Trees with expanded features for data analysis and visualization.
1516 Machine Learning & Statistical Learning BayesTree Bayesian Additive Regression Trees This is an implementation of BART:Bayesian Additive Regression Trees, by Chipman, George, McCulloch (2010).
1517 Machine Learning & Statistical Learning BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Letac et al. (2018) <arXiv:1706.04416>, Dobra and Mohammadi (2018) <doi:10.1214/18-AOAS1164>, Mohammadi et al. (2017) <doi:10.1111/rssc.12171>. To speed up the computations, the BDMCMC sampling algorithms are implemented in parallel using OpenMP in C++.
1518 Machine Learning & Statistical Learning biglasso Extending Lasso Model Fitting to Big Data Extend lasso and elastic-net model fitting for ultrahigh-dimensional, multi-gigabyte data sets that cannot be loaded into memory. It’s much more memory- and computation-efficient as compared to existing lasso-fitting packages like ‘glmnet’ and ‘ncvreg’, thus allowing for very powerful big data analysis even with an ordinary laptop.
1519 Machine Learning & Statistical Learning bmrm Bundle Methods for Regularized Risk Minimization Package Bundle methods for minimization of convex and non-convex risk under L1 or L2 regularization. Implements the algorithm proposed by Teo et al. (JMLR 2010) as well as the extension proposed by Do and Artieres (JMLR 2012). The package comes with lot of loss functions for machine learning which make it powerful for big data analysis. The applications includes: structured prediction, linear SVM, multi-class SVM, f-beta optimization, ROC optimization, ordinal regression, quantile regression, epsilon insensitive regression, least mean square, logistic regression, least absolute deviation regression (see package examples), etc… all with L1 and L2 regularization.
1520 Machine Learning & Statistical Learning Boruta Wrapper Algorithm for All Relevant Feature Selection An all relevant feature selection wrapper algorithm. It finds relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies (shadows).
1521 Machine Learning & Statistical Learning bst Gradient Boosting Functional gradient descent algorithm for a variety of convex and non-convex loss functions, for both classical and robust regression and classification problems. See Wang (2011) <doi:10.2202/1557-4679.1304>, Wang (2012) <doi:10.3414/ME11-02-0020>, Wang (2018) <doi:10.1080/10618600.2018.1424635>, Wang (2018) <doi:10.1214/18-EJS1404>.
1522 Machine Learning & Statistical Learning C50 C5.0 Decision Trees and Rule-Based Models C5.0 decision trees and rule-based models for pattern recognition that extend the work of Quinlan (1993, ISBN:1-55860-238-0).
1523 Machine Learning & Statistical Learning caret Classification and Regression Training Misc functions for training and plotting classification and regression models.
1524 Machine Learning & Statistical Learning CORElearn Classification, Regression and Feature Evaluation A suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the ‘ExplainPrediction’ package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.
1525 Machine Learning & Statistical Learning CoxBoost Cox models by likelihood based boosting for a single survival endpoint or competing risks This package provides routines for fitting Cox models by likelihood based boosting for a single endpoint or in presence of competing risks
1526 Machine Learning & Statistical Learning Cubist Rule- And Instance-Based Regression Modeling Regression modeling using rules with added instance-based corrections.
1527 Machine Learning & Statistical Learning deepnet deep learning toolkit in R Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
1528 Machine Learning & Statistical Learning e1071 (core) Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
1529 Machine Learning & Statistical Learning earth Multivariate Adaptive Regression Splines Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines” <doi:10.1214/aos/1176347963>. (The term “MARS” is trademarked and thus not used in the name of the package.)
1530 Machine Learning & Statistical Learning effects Effect Displays for Linear, Generalized Linear, and Other Models Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.
1531 Machine Learning & Statistical Learning elasticnet Elastic-Net for Sparse Estimation and Sparse PCA Provides functions for fitting the entire solution path of the Elastic-Net and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 2005-10.
1532 Machine Learning & Statistical Learning ElemStatLearn Data Sets, Functions and Examples from the Book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman Useful when reading the book above mentioned, in the documentation referred to as ‘the book’.
1533 Machine Learning & Statistical Learning evclass Evidential Distance-Based Classification Different evidential distance-based classifiers, which provide outputs in the form of Dempster-Shafer mass functions. The methods are: the evidential K-nearest neighbor rule and the evidential neural network.
1534 Machine Learning & Statistical Learning evtree Evolutionary Learning of Globally Optimal Trees Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. The ‘evtree’ package implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. CPU and memory-intensive tasks are fully computed in C++ while the ‘partykit’ package is leveraged to represent the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions.
1535 Machine Learning & Statistical Learning frbs Fuzzy Rule-Based Systems for Classification and Regression Tasks An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named ‘frbsPMML’, which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from ‘frbsPMML’. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.
1536 Machine Learning & Statistical Learning GAMBoost Generalized linear and additive models by likelihood based boosting This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting, using penalized B-splines
1537 Machine Learning & Statistical Learning gamboostLSS Boosting Methods for ‘GAMLSS’ Boosting models for fitting generalized additive models for location, shape and scale (‘GAMLSS’) to potentially high dimensional data.
1538 Machine Learning & Statistical Learning gbm (core) Generalized Boosted Regression Models An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway.
1539 Machine Learning & Statistical Learning ggRandomForests Visually Exploring Random Forests Graphic elements for exploring Random Forests using the ‘randomForest’ or ‘randomForestSRC’ package for survival, regression and classification forests and ‘ggplot2’ package plotting.
1540 Machine Learning & Statistical Learning glmnet Lasso and Elastic-Net Regularized Generalized Linear Models Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model. Two recent additions are the multiple-response Gaussian, and the grouped multinomial regression. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the paper linked to via the URL below.
1541 Machine Learning & Statistical Learning glmpath L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model A path-following algorithm for L1 regularized generalized linear models and Cox proportional hazards model.
1542 Machine Learning & Statistical Learning GMMBoost Likelihood-based Boosting for Generalized mixed models Likelihood-based Boosting for Generalized mixed models
1543 Machine Learning & Statistical Learning gradDescent Gradient Descent for Regression Tasks An implementation of various learning algorithms based on Gradient Descent for dealing with regression tasks. The variants of gradient descent algorithm are : Mini-Batch Gradient Descent (MBGD), which is an optimization to use training data partially to reduce the computation load. Stochastic Gradient Descent (SGD), which is an optimization to use a random data in learning to reduce the computation load drastically. Stochastic Average Gradient (SAG), which is a SGD-based algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), which is an optimization to speed-up gradient descent learning. Accelerated Gradient Descent (AGD), which is an optimization to accelerate gradient descent learning. Adagrad, which is a gradient-descent-based algorithm that accumulate previous cost to do adaptive learning. Adadelta, which is a gradient-descent-based algorithm that use hessian approximation to do adaptive learning. RMSprop, which is a gradient-descent-based algorithm that combine Adagrad and Adadelta adaptive learning ability. Adam, which is a gradient-descent-based algorithm that mean and variance moment to do adaptive learning. Stochastic Variance Reduce Gradient (SVRG), which is an optimization SGD-based algorithm to accelerates the process toward converging by reducing the gradient. Semi Stochastic Gradient Descent (SSGD),which is a SGD-based algorithm that combine GD and SGD to accelerates the process toward converging by choosing one of the gradients at a time. Stochastic Recursive Gradient Algorithm (SARAH), which is an optimization algorithm similarly SVRG to accelerates the process toward converging by accumulated stochastic information. Stochastic Recursive Gradient Algorithm+ (SARAHPlus), which is a SARAH practical variant algorithm to accelerates the process toward converging provides a possibility of earlier termination.
1544 Machine Learning & Statistical Learning grf Generalized Random Forests (Beta) A pluggable package for forest-based statistical estimation and inference. GRF currently provides methods for non-parametric least-squares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables). This package is currently in beta, and we expect to make continual improvements to its performance and usability.
1545 Machine Learning & Statistical Learning grplasso Fitting User-Specified Models with Group Lasso Penalty Fits user-specified (GLM-) models with group lasso penalty.
1546 Machine Learning & Statistical Learning grpreg Regularization Paths for Regression Models with Grouped Covariates Efficient algorithms for fitting the regularization path of linear regression, GLM, and Cox regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bi-level selection methods such as the group exponential lasso, the composite MCP, and the group bridge.
1547 Machine Learning & Statistical Learning h2o R Interface for ‘H2O’ R interface for ‘H2O’, the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).
1548 Machine Learning & Statistical Learning hda Heteroscedastic Discriminant Analysis Functions to perform dimensionality reduction for classification if the covariance matrices of the classes are unequal.
1549 Machine Learning & Statistical Learning hdi High-Dimensional Inference Implementation of multiple approaches to perform inference in high-dimensional models.
1550 Machine Learning & Statistical Learning hdm High-Dimensional Metrics Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/ structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <arXiv:1603.01700>.
1551 Machine Learning & Statistical Learning ICEbox Individual Conditional Expectation Plot Toolbox Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman’s partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist.
1552 Machine Learning & Statistical Learning ipred Improved Predictors Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
1553 Machine Learning & Statistical Learning kernlab (core) Kernel-Based Machine Learning Lab Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
1554 Machine Learning & Statistical Learning klaR Classification and Visualization Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to ‘svmlight’ and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.
1555 Machine Learning & Statistical Learning lars Least Angle Regression, Lasso and Forward Stagewise Efficient procedures for fitting an entire lasso sequence with the cost of a single least squares fit. Least angle regression and infinitesimal forward stagewise regression are related to the lasso, as described in the paper below.
1556 Machine Learning & Statistical Learning lasso2 L1 Constrained Estimation aka ‘lasso’ Routines and documentation for solving regression problems while imposing an L1 constraint on the estimates, based on the algorithm of Osborne et al. (1998).
1557 Machine Learning & Statistical Learning LiblineaR Linear Predictive Models Based on the ‘LIBLINEAR’ C/C++ Library A wrapper around the ‘LIBLINEAR’ C/C++ library for machine learning (available at <http://www.csie.ntu.edu.tw/~cjlin/liblinear>). ‘LIBLINEAR’ is a simple library for solving large-scale regularized linear classification and regression. It currently supports L2-regularized classification (such as logistic regression, L2-loss linear SVM and L1-loss linear SVM) as well as L1-regularized classification (such as L2-loss linear SVM and logistic regression) and L2-regularized support vector regression (with L1- or L2-loss). The main features of LiblineaR include multi-class classification (one-vs-the rest, and Crammer & Singer method), cross validation for model selection, probability estimates (logistic regression only) or weights for unbalanced data. The estimation of the models is particularly fast as compared to other libraries.
1558 Machine Learning & Statistical Learning LogicReg Logic Regression Routines for fitting Logic Regression models.
1559 Machine Learning & Statistical Learning LTRCtrees Survival Trees to Fit Left-Truncated and Right-Censored and Interval-Censored Survival Data Recursive partition algorithms designed for fitting survival tree with left-truncated and right censored (LTRC) data, as well as interval-censored data. The LTRC trees can also be used to fit survival tree with time-varying covariates.
1560 Machine Learning & Statistical Learning maptree Mapping, pruning, and graphing tree models Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees.
1561 Machine Learning & Statistical Learning mboost (core) Model-Based Boosting Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
1562 Machine Learning & Statistical Learning mlr Machine Learning in R Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, example-specific cost-sensitive learning. Generic resampling, including cross-validation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single- and multi-objective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized.
1563 Machine Learning & Statistical Learning model4you Stratified and Personalised Models Based on Model-Based Trees and Forests Model-based trees for subgroup analyses in clinical trials and model-based forests for the estimation and prediction of personalised treatment effects (personalised models). Currently partitioning of linear models, lm(), generalised linear models, glm(), and Weibull models, survreg(), is supported. Advanced plotting functionality is supported for the trees and a test for parameter heterogeneity is provided for the personalised models. For details on model-based trees for subgroup analyses see Seibold, Zeileis and Hothorn (2016) <doi:10.1515/ijb-2015-0032>; for details on model-based forests for estimation of individual treatment effects see Seibold, Zeileis and Hothorn (2017) <doi:10.1177/0962280217693034>.
1564 Machine Learning & Statistical Learning MXM Feature Selection (Including Multiple Solutions) and Bayesian Networks Many feature selection methods for a wide range of response variables, including minimal, statistically-equivalent and equally-predictive feature subsets. Bayesian network algorithms and related functions are also included. The package name ‘MXM’ stands for “Mens eX Machina”, meaning “Mind from the Machine” in Latin. Reference: Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets, Lagani, V. and Athineou, G. and Farcomeni, A. and Tsagris, M. and Tsamardinos, I. (2017). Journal of Statistical Software, 80(7). <doi:10.18637/jss.v080.i07>.
1565 Machine Learning & Statistical Learning ncvreg Regularization Paths for SCAD and MCP Penalized Regression Models Fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the “elastic net” idea). Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, inference, and prediction are also provided.
1566 Machine Learning & Statistical Learning nnet (core) Feed-Forward Neural Networks and Multinomial Log-Linear Models Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.
1567 Machine Learning & Statistical Learning oem Orthogonalizing EM: Penalized Regression for Big Tall Data Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem() and the functions cv.oem() and xval.oem() are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem() function allows for out of memory fitting.
1568 Machine Learning & Statistical Learning OneR One Rule Machine Learning Classification Algorithm with Enhancements Implements the One Rule (OneR) Machine Learning classification algorithm (Holte, R.C. (1993) <doi:10.1023/A:1022631118932>) with enhancements for sophisticated handling of numeric data and missing values together with extensive diagnostic functions. It is useful as a baseline for machine learning models and the rules are often helpful heuristics.
1569 Machine Learning & Statistical Learning opusminer OPUS Miner Algorithm for Filtered Top-k Association Discovery Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the top-k productive, non-redundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of self-sufficient itemsets, using either leverage or lift. See <http://i.giwebb.com/index.php/research/association-discovery/> for more information in relation to the OPUS Miner algorithm.
1570 Machine Learning & Statistical Learning pamr Pam: Prediction Analysis for Microarrays Some functions for sample classification in microarrays.
1571 Machine Learning & Statistical Learning party A Laboratory for Recursive Partytioning A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/1471-2105-8-25>.
1572 Machine Learning & Statistical Learning partykit A Toolkit for Recursive Partytioning A toolkit with infrastructure for representing, summarizing, and visualizing tree-structured regression and classification models. This unified infrastructure can be used for reading/coercing tree models from different sources (‘rpart’, ‘RWeka’, ‘PMML’) yielding objects that share functionality for print()/plot()/predict() methods. Furthermore, new and improved reimplementations of conditional inference trees (ctree()) and model-based recursive partitioning (mob()) from the ‘party’ package are provided based on the new infrastructure. A description of this package was published by Hothorn and Zeileis (2015) <http://jmlr.org/papers/v16/hothorn15a.html>.
1573 Machine Learning & Statistical Learning pdp Partial Dependence Plots A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.
1574 Machine Learning & Statistical Learning penalized L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Cross-validation routines allow optimization of the tuning parameters.
1575 Machine Learning & Statistical Learning penalizedLDA Penalized Classification using Fisher’s Linear Discriminant Implements the penalized LDA proposal of “Witten and Tibshirani (2011), Penalized classification using Fisher’s linear discriminant, to appear in Journal of the Royal Statistical Society, Series B”.
1576 Machine Learning & Statistical Learning picasso Pathwise Calibrated Sparse Shooting Algorithm Computationally efficient tools for fitting generalized linear model with convex or non-convex penalty. Users can enjoy the superior statistical property of non-convex penalty such as SCAD and MCP which has significantly less estimation error and overfitting compared to convex penalty such as lasso and ridge. Computation is handled by multi-stage convex relaxation and the PathwIse CAlibrated Sparse Shooting algOrithm (PICASSO) which exploits warm start initialization, active set updating, and strong rule for coordinate preselection to boost computation, and attains a linear convergence to a unique sparse local optimum with optimal statistical properties. The computation is memory-optimized using the sparse matrix output.
1577 Machine Learning & Statistical Learning plotmo Plot a Model’s Residuals, Response, and Partial Dependence Plots Plot model surfaces for a wide variety of models using partial dependence plots and other techniques. Also plot model residuals and other information on the model.
1578 Machine Learning & Statistical Learning quantregForest Quantile Regression Forests Quantile Regression Forests is a tree-based ensemble method for estimation of conditional quantiles. It is particularly well suited for high-dimensional data. Predictor variables of mixed classes can be handled. The package is dependent on the package ‘randomForest’, written by Andy Liaw.
1579 Machine Learning & Statistical Learning randomForest (core) Breiman and Cutler’s Random Forests for Classification and Regression Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <doi:10.1023/A:1010933404324>.
1580 Machine Learning & Statistical Learning randomForestSRC Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) Fast OpenMP parallel computing of Breiman’s random forests for survival, competing risks, regression and classification based on Ishwaran and Kogalur’s popular random survival forests (RSF) package. Handles missing data and now includes multivariate, unsupervised forests, quantile regression and solutions for class imbalanced data. New fast interface using subsampling and confidence regions for variable importance.
1581 Machine Learning & Statistical Learning ranger A Fast Implementation of Random Forests A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class ‘gwaa.data’ (R package ‘GenABEL’) and ‘dgCMatrix’ (R package ‘Matrix’) can be directly analyzed.
1582 Machine Learning & Statistical Learning rattle Graphical User Interface for Data Science in R The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself.
1583 Machine Learning & Statistical Learning Rborist Extensible, Parallelizable Implementation of the Random Forest Algorithm Scalable implementation of classification and regression forests, as described by Breiman (2001), <doi:10.1023/A:1010933404324>.
1584 Machine Learning & Statistical Learning RcppDL Deep Learning Methods via Rcpp This package is based on the C++ code from Yusuke Sugomori, which implements basic machine learning methods with many layers (deep learning), including dA (Denoising Autoencoder), SdA (Stacked Denoising Autoencoder), RBM (Restricted Boltzmann machine) and DBN (Deep Belief Nets).
1585 Machine Learning & Statistical Learning rdetools Relevant Dimension Estimation (RDE) in Feature Spaces The package provides functions for estimating the relevant dimension of a data set in feature spaces, applications to model selection, graphical illustrations and prediction.
1586 Machine Learning & Statistical Learning REEMtree Regression Trees with Random Effects for Longitudinal (Panel) Data This package estimates regression trees with random effects as a way to use data mining techniques to describe longitudinal or panel data.
1587 Machine Learning & Statistical Learning relaxo Relaxed Lasso Relaxed Lasso is a generalisation of the Lasso shrinkage technique for linear regression. Both variable selection and parameter estimation is achieved by regular Lasso, yet both steps do not necessarily use the same penalty parameter. The results include all standard Lasso solutions but allow often for sparser models while having similar or even slightly better predictive performance if many predictor variables are present. The package depends on the LARS package.
1588 Machine Learning & Statistical Learning rgenoud R Version of GENetic Optimization Using Derivatives A genetic algorithm plus derivative optimizer.
1589 Machine Learning & Statistical Learning RGF Regularized Greedy Forest Regularized Greedy Forest wrapper of the ‘Regularized Greedy Forest’ <https://github.com/RGF-team/rgf/tree/master/python-package> ‘python’ package, which also includes a Multi-core implementation (FastRGF) <https://github.com/RGF-team/rgf/tree/master/FastRGF>.
1590 Machine Learning & Statistical Learning RLT Reinforcement Learning Trees Random forest with a variety of additional features for regression, classification and survival analysis. The features include: parallel computing with OpenMP, embedded model for selecting the splitting variable (based on Zhu, Zeng & Kosorok, 2015), subject weight, variable weight, tracking subjects used in each tree, etc.
1591 Machine Learning & Statistical Learning Rmalschains Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R An implementation of an algorithm family for continuous optimization called memetic algorithms with local search chains (MA-LS-Chains). Memetic algorithms are hybridizations of genetic algorithms with local search methods. They are especially suited for continuous optimization.
1592 Machine Learning & Statistical Learning rminer Data Mining Classification and Regression Methods Facilitates the use of data mining algorithms in classification and regression (including time series forecasting) tasks by presenting a short and coherent set of functions. Versions: 1.4.2 new NMAE metric, “xgboost” and “cv.glmnet” models (16 classification and 18 regression models); 1.4.1 new tutorial and more robust version; 1.4 - new classification and regression models/algorithms, with a total of 14 classification and 15 regression methods, including: Decision Trees, Neural Networks, Support Vector Machines, Random Forests, Bagging and Boosting; 1.3 and 1.3.1 - new classification and regression metrics (improved mmetric function); 1.2 - new input importance methods (improved Importance function); 1.0 - first version.
1593 Machine Learning & Statistical Learning ROCR Visualizing the Performance of Scoring Classifiers ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.
1594 Machine Learning & Statistical Learning RoughSets Data Analysis Using Rough Set and Fuzzy Rough Set Theories Implementations of algorithms for data analysis based on the rough set theory (RST) and the fuzzy rough set theory (FRST). We not only provide implementations for the basic concepts of RST and FRST but also popular algorithms that derive from those theories. The methods included in the package can be divided into several categories based on their functionality: discretization, feature selection, instance selection, rule induction and classification based on nearest neighbors. RST was introduced by Zdzisaw Pawlak in 1982 as a sophisticated mathematical tool to model and process imprecise or incomplete information. By using the indiscernibility relation for objects/instances, RST does not require additional parameters to analyze the data. FRST is an extension of RST. The FRST combines concepts of vagueness and indiscernibility that are expressed with fuzzy sets (as proposed by Zadeh, in 1965) and RST.
1595 Machine Learning & Statistical Learning rpart (core) Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
1596 Machine Learning & Statistical Learning RPMM Recursively Partitioned Mixture Model Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.
1597 Machine Learning & Statistical Learning RSNNS Neural Networks using the Stuttgart Neural Network Simulator (SNNS) The Stuttgart Neural Network Simulator (SNNS) is a library containing many standard implementations of neural networks. This package wraps the SNNS functionality to make it available from within R. Using the ‘RSNNS’ low-level interface, all of the algorithmic functionality and flexibility of SNNS can be accessed. Furthermore, the package contains a convenient high-level interface, so that the most common neural network topologies and learning algorithms integrate seamlessly into R.
1598 Machine Learning & Statistical Learning RWeka R/Weka Interface An R interface to Weka (Version 3.9.3). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Package ‘RWeka’ contains the interface code, the Weka jar is in a separate package ‘RWekajars’. For more information on Weka see <http://www.cs.waikato.ac.nz/ml/weka/>.
1599 Machine Learning & Statistical Learning RXshrink Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression Identify and display TRACEs for a specified shrinkage path and determine the extent of shrinkage most likely, under normal distribution theory, to produce an optimal reduction in MSE Risk in estimates of regression (beta) coefficients. Alternative estimates are also provided when ill-conditioned (nearly multicollinear) models yield OLS estimates with “wrong” numerical signs.
1600 Machine Learning & Statistical Learning sda Shrinkage Discriminant Analysis and CAT Score Variable Selection Provides an efficient framework for high-dimensional linear and diagonal discriminant analysis with variable selection. The classifier is trained using James-Stein-type shrinkage estimators and predictor variables are ranked using correlation-adjusted t-scores (CAT scores). Variable selection error is controlled using false non-discovery rates or higher criticism.
1601 Machine Learning & Statistical Learning SIS Sure Independence Screening Variable selection techniques are essential tools for model selection and estimation in high-dimensional statistical models. Through this publicly available package, we provide a unified environment to carry out variable selection using iterative sure independence screening (SIS) and all of its variants in generalized linear models and the Cox proportional hazards model.
1602 Machine Learning & Statistical Learning ssgraph Bayesian Graphical Estimation using Spike-and-Slab Priors Bayesian estimation for undirected graphical models using spike-and-slab priors. The package handles continuous, discrete, and mixed data. To speed up the computations, the computationally intensive tasks of the package are implemented in C++ in parallel using OpenMP.
1603 Machine Learning & Statistical Learning stabs Stability Selection with Error Control Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.
1604 Machine Learning & Statistical Learning SuperLearner Super Learner Prediction Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.
1605 Machine Learning & Statistical Learning svmpath The SVM Path Algorithm Computes the entire regularization path for the two-class svm classifier with essentially the same cost as a single SVM fit.
1606 Machine Learning & Statistical Learning tensorflow R Interface to ‘TensorFlow’ Interface to ‘TensorFlow’ <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
1607 Machine Learning & Statistical Learning tgp Bayesian Treed Gaussian Process Models Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions.
1608 Machine Learning & Statistical Learning tree Classification and Regression Trees Classification and regression trees.
1609 Machine Learning & Statistical Learning trtf Transformation Trees and Forests Recursive partytioning of transformation models with corresponding random forest for conditional transformation models as described in ‘Transformation Forests’ (Hothorn and Zeileis, 2017, <arXiv:1701.02110>) and ‘Top-Down Transformation Choice’ (Hothorn, 2018, <doi:10.1177/1471082X17748081>).
1610 Machine Learning & Statistical Learning varSelRF Variable Selection using Random Forests Variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). Main applications in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).
1611 Machine Learning & Statistical Learning vcrpart Tree-Based Varying Coefficient Regression for Generalized Linear and Ordinal Mixed Models Recursive partitioning for varying coefficient generalized linear models and ordinal linear mixed models. Special features are coefficient-wise partitioning, non-varying coefficients and partitioning of time-varying variables in longitudinal regression.
1612 Machine Learning & Statistical Learning wsrf Weighted Subspace Random Forest for Classification A parallel implementation of Weighted Subspace Random Forest. The Weighted Subspace Random Forest algorithm was proposed in the International Journal of Data Warehousing and Mining by Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang, and Yunming Ye (2012) <doi:10.4018/jdwm.2012040103>. The algorithm can classify very high-dimensional data with random forests built using small subspaces. A novel variable weighting method is used for variable subspace selection in place of the traditional random variable sampling.This new approach is particularly useful in building models from high-dimensional data.
1613 Machine Learning & Statistical Learning xgboost Extreme Gradient Boosting Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
1614 Medical Image Analysis adaptsmoFMRI Adaptive Smoothing of FMRI Data This package contains R functions for estimating the blood oxygenation level dependent (BOLD) effect by using functional Magnetic Resonance Imaging (fMRI) data, based on adaptive Gauss Markov random fields, for real as well as simulated data. The implemented simulations make use of efficient Markov Chain Monte Carlo methods.
1615 Medical Image Analysis adimpro (core) Adaptive Smoothing of Digital Images Implements tools for manipulation of digital images and the Propagation Separation approach by Polzehl and Spokoiny (2006) <doi:10.1007/s00440-005-0464-1> for smoothing digital images, see Polzehl and Tabelow (2007) <doi:10.18637/jss.v019.i01>.
1616 Medical Image Analysis AnalyzeFMRI (core) Functions for Analysis of fMRI Datasets Stored in the ANALYZE or NIFTI Format Functions for I/O, visualisation and analysis of functional Magnetic Resonance Imaging (fMRI) datasets stored in the ANALYZE or NIFTI format. Note that the latest version of XQuartz seems to be necessary under MacOS.
1617 Medical Image Analysis arf3DS4 (core) Activated Region Fitting, fMRI data analysis (3D) Activated Region Fitting (ARF) is an analysis method for fMRI data.
1618 Medical Image Analysis bayesImageS Bayesian Methods for Image Segmentation using a Potts Model Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior of Moores et al. (2015) <doi:10.1016/j.csda.2014.12.001>. Latent labels are sampled using chequerboard updating or Swendsen-Wang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, approximate Bayesian computation (ABC-MCMC and ABC-SMC), and the parametric functional approximate Bayesian (PFAB) algorithm. Refer to <doi:10.1007/s11222-014-9525-6> and <doi:10.1214/18-BA1130> for further details.
1619 Medical Image Analysis brainR Helper Functions to ‘misc3d’ and ‘rgl’ Packages for Brain Imaging This includes functions for creating 3D and 4D images using ‘WebGL’, ‘rgl’, and ‘JavaScript’ commands. This package relies on the X toolkit (‘XTK’, <https://github.com/xtk/X#readme>).
1620 Medical Image Analysis brainwaver Basic wavelet analysis of multivariate time series with a visualisation and parametrisation using graph theory This package computes the correlation matrix for each scale of a wavelet decomposition, namely the one performed by the R package waveslim (Whitcher, 2000). An hypothesis test is applied to each entry of one matrix in order to construct an adjacency matrix of a graph. The graph obtained is finally analysed using the small-world theory (Watts and Strogatz, 1998) and using the computation of efficiency (Latora, 2001), tested using simulated attacks. The brainwaver project is complementary to the camba project for brain-data preprocessing. A collection of scripts (with a makefile) is avalaible to download along with the brainwaver package, see information on the webpage mentioned below.
1621 Medical Image Analysis DATforDCEMRI (core) Deconvolution Analysis Tool for Dynamic Contrast Enhanced MRI This package performs voxel-wise deconvolution analysis of DCE-MRI contrast agent concentration versus time data and generates the Impulse Response Function, which can be used to approximate commonly utilized kinetic parameters such as Ktrans and ve. An interactive advanced voxel diagnosis tool (AVDT) is also provided to facilitate easy navigation of voxel-wise data.
1622 Medical Image Analysis dcemriS4 (core) A Package for Image Analysis of DCE-MRI (S4 Implementation) A collection of routines and documentation that allows one to perform voxel-wise quantitative analysis of dynamic contrast-enhanced MRI (DEC-MRI) and diffusion-weighted imaging (DWI) data, with emphasis on oncology applications.
1623 Medical Image Analysis divest (core) Get Images Out of DICOM Format Quickly Provides tools to sort DICOM-format medical image files, and convert them to NIfTI-1 format.
1624 Medical Image Analysis dpmixsim (core) Dirichlet Process Mixture Model Simulation for Clustering and Image Segmentation The ‘dpmixsim’ package implements a Dirichlet Process Mixture (DPM) model for clustering and image segmentation. The DPM model is a Bayesian nonparametric methodology that relies on MCMC simulations for exploring mixture models with an unknown number of components. The code implements conjugate models with normal structure (conjugate normal-normal DP mixture model). The package’s applications are oriented towards the classification of magnetic resonance images according to tissue type or region of interest.
1625 Medical Image Analysis dti (core) Analysis of Diffusion Weighted Imaging (DWI) Data Diffusion Weighted Imaging (DWI) is a Magnetic Resonance Imaging modality, that measures diffusion of water in tissues like the human brain. The package contains R-functions to process diffusion-weighted data. The functionality includes diffusion tensor imaging (DTI), diffusion kurtosis imaging (DKI), modeling for high angular resolution diffusion weighted imaging (HARDI) using Q-ball-reconstruction and tensor mixture models, several methods for structural adaptive smoothing including POAS and msPOAS, and a streamline fiber tracking for tensor and tensor mixture models. The package provides functionality to manipulate and visualize results in 2D and 3D.
1626 Medical Image Analysis edfReader (core) Reading EDF(+) and BDF(+) Files Reads European Data Format files EDF and EDF+, see <http://www.edfplus.info>, BioSemi Data Format files BDF, see <http://www.biosemi.com/faq/file_format.htm>, and BDF+ files, see <http://www.teuniz.net/edfbrowser/bdfplus%20format%20description.html>. The files are read in two steps: first the header is read and then the signals (using the header object as a parameter).
1627 Medical Image Analysis eegkit (core) Toolkit for Electroencephalography Data Analysis and visualization tools for electroencephalography (EEG) data. Includes functions for (i) plotting EEG data, (ii) filtering EEG data, (iii) smoothing EEG data; (iv) frequency domain (Fourier) analysis of EEG data, (v) Independent Component Analysis of EEG data, and (vi) simulating event-related potential EEG data.
1628 Medical Image Analysis fmri (core) Analysis of fMRI Experiments Contains R-functions to perform an fMRI analysis as described in Tabelow et al. (2006) <doi:10.1016/j.neuroimage.2006.06.029>, Polzehl et al. (2010) <doi:10.1016/j.neuroimage.2010.04.241>, Tabelow and Polzehl (2011) <doi:10.18637/jss.v044.i11>.
1629 Medical Image Analysis fslr Wrapper Functions for ‘FSL’ (‘FMRIB’ Software Library) from Functional MRI of the Brain (‘FMRIB’) Wrapper functions that interface with ‘FSL’ <http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/>, a powerful and commonly-used ‘neuroimaging’ software, using system commands. The goal is to be able to interface with ‘FSL’ completely in R, where you pass R objects of class ‘nifti’, implemented by package ‘oro.nifti’, and the function executes an ‘FSL’ command and returns an R object of class ‘nifti’ if desired.
1630 Medical Image Analysis gdimap (core) Generalized Diffusion Magnetic Resonance Imaging Diffusion anisotropy has been used to characterize white matter neuronal pathways in the human brain, and infer global connectivity in the central nervous system. The package implements algorithms to estimate and visualize the orientation of neuronal pathways in model-free methods (q-space imaging methods). For estimating fibre orientations two methods have been implemented. One method implements fibre orientation detection through local maxima extraction. A second more robust method is based on directional statistical clustering of ODF voxel data. Fibre orientations in multiple fibre voxels are estimated using a mixture of von Mises-Fisher (vMF) distributions. This statistical estimation procedure is used to resolve crossing fibre configurations. Reconstruction of orientation distribution function (ODF) profiles may be performed using the standard generalized q-sampling imaging (GQI) approach, Garyfallidis’ GQI (GQI2) approach, or Aganj’s variant of the Q-ball imaging (CSA-QBI) approach. Procedures for the visualization of RGB-maps, line-maps and glyph-maps of real diffusion magnetic resonance imaging (dMRI) data-sets are included in the package.
1631 Medical Image Analysis mmand (core) Mathematical Morphology in Any Number of Dimensions Provides tools for performing mathematical morphology operations, such as erosion and dilation, on data of arbitrary dimensionality. Can also be used for finding connected components, resampling, filtering, smoothing and other image processing-style operations.
1632 Medical Image Analysis Morpho (core) Calculations and Visualisations Related to Geometric Morphometrics A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.
1633 Medical Image Analysis mritc (core) MRI Tissue Classification Various methods for MRI tissue classification.
1634 Medical Image Analysis neuroim (core) Data Structures and Handling for Neuroimaging Data A collection of data structures that represent volumetric brain imaging data. The focus is on basic data handling for 3D and 4D neuroimaging data. In addition, there are function to read and write NIFTI files and limited support for reading AFNI files.
1635 Medical Image Analysis neuRosim (core) Functions to Generate fMRI Data Including Activated Data, Noise Data and Resting State Data The package allows users to generate fMRI time series or 4D data. Some high-level functions are created for fast data generation with only a few arguments and a diversity of functions to define activation and noise. For more advanced users it is possible to use the low-level functions and manipulate the arguments.
1636 Medical Image Analysis occ (core) Estimates PET Neuroreceptor Occupancies Generic function for estimating positron emission tomography (PET) neuroreceptor occupancies from the total volumes of distribution of a set of regions of interest. Fittings methods include the simple ‘reference region’ and ‘ordinary least squares’ (sometimes known as occupancy plot) methods, as well as the more efficient ‘restricted maximum likelihood estimation’.
1637 Medical Image Analysis oro.dicom (core) Rigorous - DICOM Input / Output Data input/output functions for data that conform to the Digital Imaging and Communications in Medicine (DICOM) standard, part of the Rigorous Analytics bundle.
1638 Medical Image Analysis oro.nifti (core) Rigorous - NIfTI + ANALYZE + AFNI : Input / Output Functions for the input/output and visualization of medical imaging data that follow either the ANALYZE, NIfTI or AFNI formats. This package is part of the Rigorous Analytics bundle.
1639 Medical Image Analysis PET Simulation and Reconstruction of PET Images Implementation of different analytic/direct and iterative reconstruction methods of radon transformed data such as PET data. It also offer the possibility to simulate PET data.
1640 Medical Image Analysis PTAk Principal Tensor Analysis on k Modes A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting non-identity metrics and penalisations. 2-way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tucker-n) and PARAFAC/CANDECOMP with these extensions.
1641 Medical Image Analysis RNifti (core) Fast R and C++ Access to NIfTI Images Provides very fast read and write access to images stored in the NIfTI-1 and ANALYZE-7.5 formats, with seamless synchronisation between compiled C and interpreted R code. Also provides a C/C++ API that can be used by other packages. Not to be confused with ‘RNiftyReg’, which performs image registration.
1642 Medical Image Analysis RNiftyReg (core) Image Registration Using the ‘NiftyReg’ Library Provides an ‘R’ interface to the ‘NiftyReg’ image registration tools <http://sourceforge.net/projects/niftyreg/>. Linear and nonlinear registration are supported, in two and three dimensions.
1643 Medical Image Analysis Rvcg (core) Manipulations of Triangular Meshes Based on the ‘VCGLIB’ API Operations on triangular meshes based on ‘VCGLIB’. This package integrates nicely with the R-package ‘rgl’ to render the meshes processed by ‘Rvcg’. The Visualization and Computer Graphics Library (VCG for short) is an open source portable C++ templated library for manipulation, processing and displaying with OpenGL of triangle and tetrahedral meshes. The library, composed by more than 100k lines of code, is released under the GPL license, and it is the base of most of the software tools of the Visual Computing Lab of the Italian National Research Council Institute ISTI <http://vcg.isti.cnr.it>, like ‘metro’ and ‘MeshLab’. The ‘VCGLIB’ source is pulled from trunk <https://github.com/cnr-isti-vclab/vcglib> and patched to work with options determined by the configure script as well as to work with the header files included by ‘RcppEigen’.
1644 Medical Image Analysis tractor.base (core) Read, Manipulate and Visualise Magnetic Resonance Images Functions for working with magnetic resonance images. Reading and writing of popular file formats (DICOM, Analyze, NIfTI-1, NIfTI-2, MGH); interactive and non-interactive visualisation; flexible image manipulation; metadata and sparse image handling.
1645 Medical Image Analysis waveslim Basic Wavelet Routines for One-, Two- And Three-Dimensional Signal Processing Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dual-tree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 4-7 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.
1646 Meta-Analysis aggregation p-Value Aggregation Methods Contains functionality for performing the following methods of p-value aggregation: Fisher’s method [Fisher, RA (1932, ISBN: 9780028447308)], the Lancaster method (weighted Fisher’s method) [Lancaster, HO (1961, <doi:10.1111/j.1467-842X.1961.tb00058.x>)], and Sidak correction [Sidak, Z (1967, <doi:10.1080/01621459.1967.10482935>)]. Please cite Yi et al., the manuscript corresponding to this package [Yi, L et al., (2017), <doi:10.1101/190199>].
1647 Meta-Analysis altmeta Alternative Meta-Analysis Methods Provides alternative statistical methods for meta-analysis, including new heterogeneity tests and measures that are robust to outliers.
1648 Meta-Analysis bamdit Bayesian Meta-Analysis of Diagnostic Test Data Functions for Bayesian meta-analysis of diagnostic test data which are based on a scale mixtures bivariate random-effects model.
1649 Meta-Analysis bayesmeta Bayesian Random-Effects Meta-Analysis A collection of functions allowing to derive the posterior distribution of the two parameters in a random-effects meta-analysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive p-values, etc.
1650 Meta-Analysis bmeta Bayesian Meta-Analysis and Meta-Regression Provides a collection of functions for conducting meta-analyses under Bayesian context in R. The package includes functions for computing various effect size or outcome measures (e.g. odds ratios, mean difference and incidence rate ratio) for different types of data based on MCMC simulations. Users are allowed to fit fixed- and random-effects models with different priors to the data. Meta-regression can be carried out if effects of additional covariates are observed. Furthermore, the package provides functions for creating posterior distribution plots and forest plot to display main model output. Traceplots and some other diagnostic plots are also available for assessing model fit and performance.
1651 Meta-Analysis bspmma Bayesian Semiparametric Models for Meta-Analysis The main functions carry out Gibbs’ sampler routines for nonparametric and semiparametric Bayesian models for random effects meta-analysis.
1652 Meta-Analysis CAMAN Finite Mixture Models and Meta-Analysis Tools - Based on C.A.MAN Tools for the analysis of finite semiparametric mixtures. These are useful when data is heterogeneous, e.g. in pharmacokinetics or meta-analysis. The NPMLE and VEM algorithms (flexible support size) and EM algorithms (fixed support size) are provided for univariate and bivariate data.
1653 Meta-Analysis CIAAWconsensus Isotope Ratio Meta-Analysis Calculation of consensus values for atomic weights, isotope amount ratios, and isotopic abundances with the associated uncertainties using multivariate meta-regression approach for consensus building.
1654 Meta-Analysis clubSandwich Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the bias-reduced linearization estimator introduced by Bell and McCaffrey (2002) <http://www.statcan.gc.ca/pub/12-001-x/2002002/article/9058-eng.pdf> and developed further by Pustejovsky and Tipton (2017) <doi:10.1080/07350015.2016.1247004>. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple- contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple- contrast hypotheses use an approximation to Hotelling’s T-squared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), ivreg (from package ‘AER’), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and rma.mv() (from ‘metafor’).
1655 Meta-Analysis compute.es Compute Effect Sizes This package contains several functions for calculating the most widely used effect sizes (ES), along with their variances, confidence intervals and p-values. The output includes ES’s of d (mean difference), g (unbiased estimate of d), r (correlation coefficient), z’ (Fisher’s z), and OR (odds ratio and log odds ratio). In addition, NNT (number needed to treat), U3, CLES (Common Language Effect Size) and Cliff’s Delta are computed. This package uses recommended formulas as described in The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).
1656 Meta-Analysis ConfoundedMeta Sensitivity Analyses for Unmeasured Confounding in Meta-Analyses Conducts sensitivity analyses for unmeasured confounding in random-effects meta-analysis per Mathur & VanderWeele (in preparation). Given output from a random-effects meta-analysis with a relative risk outcome, computes point estimates and inference for: (1) the proportion of studies with true causal effect sizes more extreme than a specified threshold of scientific significance; and (2) the minimum bias factor and confounding strength required to reduce to less than a specified threshold the proportion of studies with true effect sizes of scientifically significant size. Creates plots and tables for visualizing these metrics across a range of bias values. Provides tools to easily scrape study-level data from a published forest plot or summary table to obtain the needed estimates when these are not reported.
1657 Meta-Analysis CopulaREMADA Copula Mixed Models for Multivariate Meta-Analysis of Diagnostic Test Accuracy Studies The bivariate copula mixed model for meta-analysis of diagnostic test accuracy studies in Nikoloulopoulos (2015) <doi:10.1002/sim.6595>. The vine copula mixed model for meta-analysis of diagnostic test accuracy studies accounting for disease prevalence in Nikoloulopoulos (2017) <doi:10.1177/0962280215596769> and also accounting for non-evaluable subjects in Nikoloulopoulos (2018) <arXiv:1812.03685>. The hybrid vine copula mixed model for meta-analysis of diagnostic test accuracy case-control and cohort studies in Nikoloulopoulos (2018) <doi:10.1177/0962280216682376>. The D-vine copula mixed model for meta-analysis and comparison of two diagnostic tests in Nikoloulopoulos (2018) <doi:10.1177/0962280218796685>. The multinomial quadrivariate D-vine copula mixed model for meta-analysis of diagnostic tests with non-evaluable subjects in Nikoloulopoulos (2018) <arXiv:1812.05915>.
1658 Meta-Analysis CPBayes Bayesian Meta Analysis for Studying Cross-Phenotype Genetic Associations A Bayesian meta-analysis method for studying cross-phenotype genetic associations. It uses summary-level data across multiple phenotypes to simultaneously measure the evidence of aggregate-level pleiotropic association and estimate an optimal subset of traits associated with the risk locus. CPBayes is based on a spike and slab prior.
1659 Meta-Analysis CRTSize Sample Size Estimation Functions for Cluster Randomized Trials Sample size estimation in cluster (group) randomized trials. Contains traditional power-based methods, empirical smoothing (Rotondi and Donner, 2009), and updated meta-analysis techniques (Rotondi and Donner, 2012).
1660 Meta-Analysis dfmeta Meta-Analysis of Phase I Dose-Finding Early Clinical Trials Meta-analysis approaches for Phase I dose finding early phases clinical trials in order to better suit requirements in terms of maximum tolerated dose (MTD) and maximal dose regimen (MDR). This package has currently three different approaches: (a) an approach proposed by Zohar et al, 2011, <doi:10.1002/sim.4121> (denoted as ZKO), (b) the Variance Weighted pooling analysis (called VarWT) and (c) the Random Effects Model Based (REMB) algorithm, where user can input his/her own model based approach or use the existing random effect logistic regression model (named as glimem) through the ‘dfmeta’ package.
1661 Meta-Analysis diagmeta Meta-Analysis of Diagnostic Accuracy Studies with Several Cutpoints Provides methods by Steinhauser et al. (2016) <doi:10.1186/s12874-016-0196-1> for meta-analysis of diagnostic accuracy studies with several cutpoints.
1662 Meta-Analysis dosresmeta Multivariate Dose-Response Meta-Analysis Estimates dose-response relations from summarized dose-response data and to combines them according to principles of (multivariate) random-effects models.
1663 Meta-Analysis ecoreg Ecological Regression using Aggregate and Individual Data Estimating individual-level covariate-outcome associations using aggregate data (“ecological inference”) or a combination of aggregate and individual-level data (“hierarchical related regression”).
1664 Meta-Analysis effsize Efficient Effect Size Computation A collection of functions to compute the standardized effect sizes for experiments (Cohen d, Hedges g, Cliff delta, Vargha-Delaney A). The computation algorithms have been optimized to allow efficient computation even with very large data sets.
1665 Meta-Analysis epiR Tools for the Analysis of Epidemiological Data Tools for the analysis of epidemiological data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, and computing confidence intervals around incidence risk and incidence rate estimates. Miscellaneous functions for use in meta-analysis, diagnostic test interpretation, and sample size calculations.
1666 Meta-Analysis esc Effect Size Computation for Meta Analysis Implementation of the web-based ‘Practical Meta-Analysis Effect Size Calculator’ from David B. Wilson (<http://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-Home.php>) in R. Based on the input, the effect size can be returned as standardized mean difference, Cohen’s f, Hedges’ g, Pearson’s r or Fisher’s transformation z, odds ratio or log odds, or eta squared effect size.
1667 Meta-Analysis etma Epistasis Test in Meta-Analysis Traditional meta-regression based method has been developed for using meta-analysis data, but it faced the challenge of inconsistent estimates. This package purpose a new statistical method to detect epistasis using incomplete information summary, and have proven it not only successfully let consistency of evidence, but also increase the power compared with traditional method (Detailed tutorial is shown in website).
1668 Meta-Analysis exactmeta Exact fixed effect meta analysis Perform exact fixed effect meta analysis for rare events data without the need of artificial continuity correction.
1669 Meta-Analysis extfunnel Additional Funnel Plot Augmentations This is a package containing the function extfunnel() which produces a funnel plot including additional augmentations such as statistical significance contours and heterogeneity contours.
1670 Meta-Analysis forestmodel Forest Plots from Regression Models Produces forest plots using ‘ggplot2’ from models produced by functions such as stats::lm(), stats::glm() and survival::coxph().
1671 Meta-Analysis forestplot Advanced Forest Plot Using ‘grid’ Graphics A forest plot that allows for multiple confidence intervals per row, custom fonts for each text element, custom confidence intervals, text mixed with expressions, and more. The aim is to extend the use of forest plots beyond meta-analyses. This is a more general version of the original ‘rmeta’ package’s forestplot() function and relies heavily on the ‘grid’ package.
1672 Meta-Analysis gap Genetic Analysis Package It is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates.
1673 Meta-Analysis gemtc Network Meta-Analysis Using Bayesian Methods Network meta-analyses (mixed treatment comparisons) in the Bayesian framework using JAGS. Includes methods to assess heterogeneity and inconsistency, and a number of standard visualizations.
1674 Meta-Analysis getmstatistic Quantifying Systematic Heterogeneity in Meta-Analysis Quantifying systematic heterogeneity in meta-analysis using R. The M statistic aggregates heterogeneity information across multiple variants to, identify systematic heterogeneity patterns and their direction of effect in meta-analysis. It’s primary use is to identify outlier studies, which either show “null” effects or consistently show stronger or weaker genetic effects than average across, the panel of variants examined in a GWAS meta-analysis. In contrast to conventional heterogeneity metrics (Q-statistic, I-squared and tau-squared) which measure random heterogeneity at individual variants, M measures systematic (non-random) heterogeneity across multiple independently associated variants. Systematic heterogeneity can arise in a meta-analysis due to differences in the study characteristics of participating studies. Some of the differences may include: ancestry, allele frequencies, phenotype definition, age-of-disease onset, family-history, gender, linkage disequilibrium and quality control thresholds. See <https://magosil86.github.io/getmstatistic/> for statistical statistical theory, documentation and examples.
1675 Meta-Analysis gmeta Meta-Analysis via a Unified Framework of Confidence Distribution An implementation of an all-in-one function for a wide range of meta-analysis problems. It contains three functions. The gmeta() function unifies all standard meta-analysis methods and also several newly developed ones under a framework of combining confidence distributions (CDs). Specifically, the package can perform classical p-value combination methods (such as methods of Fisher, Stouffer, Tippett, etc.), fit meta-analysis fixed-effect and random-effects models, and synthesizes 2x2 tables. Furthermore, it can perform robust meta-analysis, which provides protection against model-misspecifications, and limits the impact of any unknown outlying studies. In addition, the package implements two exact meta-analysis methods from synthesizing 2x2 tables with rare events (e.g., zero total event). The np.gmeta() function summarizes information obtained from multiple studies and makes inference for study-level parameters with no distributional assumption. Specifically, it can construct confidence intervals for unknown, fixed study-level parameters via confidence distribution. Furthermore, it can perform estimation via asymptotic confidence distribution whether tie or near tie condition exist or not. The plot.gmeta() function to visualize individual and combined CDs through extended forest plots is also available. Compared to version 2.2-6, version 2.3-0 contains a new function np.gmeta().
1676 Meta-Analysis hetmeta Heterogeneity Measures in Meta-Analysis Assess the presence of statistical heterogeneity and quantify its impact in the context of meta-analysis. It includes test for heterogeneity as well as other statistical measures (R_b, I^2, R_I).
1677 Meta-Analysis ipdmeta Tools for subgroup analyses with multiple trial data using aggregate statistics This package provides functions to estimate an IPD linear mixed effects model for a continuous outcome and any categorical covariate from study summary statistics. There are also functions for estimating the power of a treatment-covariate interaction test in an individual patient data meta-analysis from aggregate data.
1678 Meta-Analysis joineRmeta Joint Modelling for Meta-Analytic (Multi-Study) Data Fits joint models of the type proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extends to the multi-study, meta-analytic case. Functions for meta-analysis of a single longitudinal and a single time-to-event outcome from multiple studies using joint models. Options to produce plots for multi study joint data, to pool joint model fits from ‘JM’ and ‘joineR’ packages in a two stage meta-analysis, and to model multi-study joint data in a one stage meta-analysis.
1679 Meta-Analysis joint.Cox Joint Frailty-Copula Models for Tumour Progression and Death in Meta-Analysis Perform likelihood estimation and dynamic prediction under joint frailty-copula models for tumour progression and death in meta-analysis. A penalized likelihood is employed for estimating model parameters, where the baseline hazard functions are approximated by smoothing splines. The methods are applicable for meta-analytic data combining several studies. The methods can analyze data having information on both terminal event time (e.g., time-to-death) and non-terminal event time (e.g., time-to-tumour progression). See Emura et al. (2017) <doi:10.1177/0962280215604510> for likelihood estimation, and Emura et al. (2018) <doi:10.1177/0962280216688032> for dynamic prediction. Survival data from ovarian cancer patients are also available.
1680 Meta-Analysis MAc Meta-Analysis with Correlations This is an integrated meta-analysis package for conducting a correlational research synthesis. One of the unique features of this package is in its integration of user-friendly functions to facilitate statistical analyses at each stage in a meta-analysis with correlations. It uses recommended procedures as described in The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).
1681 Meta-Analysis MAd Meta-Analysis with Mean Differences A collection of functions for conducting a meta-analysis with mean differences data. It uses recommended procedures as described in The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).
1682 Meta-Analysis mada Meta-Analysis of Diagnostic Accuracy Provides functions for diagnostic meta-analysis. Next to basic analysis and visualization the bivariate Model of Reitsma et al. (2005) that is equivalent to the HSROC of Rutter & Gatsonis (2001) can be fitted. A new approach based to diagnostic meta-analysis of Holling et al. (2012) is also available. Standard methods like summary, plot and so on are provided.
1683 Meta-Analysis MAVIS Meta Analysis via Shiny Interactive shiny application for running a meta-analysis, provides support for both random effects and fixed effects models with the ‘metafor’ package. Additional support is included for calculating effect sizes plus support for single case designs, graphical output, and detecting publication bias.
1684 Meta-Analysis MendelianRandomization Mendelian Randomization Package Encodes several methods for performing Mendelian randomization analyses with summarized data. Summarized data on genetic associations with the exposure and with the outcome can be obtained from large consortia. These data can be used for obtaining causal estimates using instrumental variable methods.
1685 Meta-Analysis meta (core) General Package for Meta-Analysis User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rucker <doi:10.1007/978-3-319-21416-0>, “Meta-Analysis with R” (2015): - fixed effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L’Abbe, Baujat, bubble); - statistical tests and trim-and-fill method to evaluate bias in meta-analysis; - import data from ‘RevMan 5’; - prediction interval, Hartung-Knapp and Paule-Mandel method for random effects model; - cumulative meta-analysis and leave-one-out meta-analysis; - meta-regression; - generalised linear mixed models; - produce forest plot summarising several (subgroup) meta-analyses.
1686 Meta-Analysis meta4diag Meta-Analysis for Diagnostic Test Studies Bayesian inference analysis for bivariate meta-analysis of diagnostic test studies using integrated nested Laplace approximation with INLA. A purpose built graphic user interface is available. The installation of R package INLA is compulsory for successful usage. The INLA package can be obtained from <http://www.r-inla.org>. We recommend the testing version, which can be downloaded by running: install.packages(“INLA”, repos=c(getOption(“repos”), INLA=“https://inla.r-inla-download.org/R/testing”), dep=TRUE).
1687 Meta-Analysis MetaAnalyser An Interactive Visualisation of Meta-Analysis as a Physical Weighing Machine An interactive application to visualise meta-analysis data as a physical weighing machine. The interface is based on the Shiny web application framework, though can be run locally and with the user’s own data.
1688 Meta-Analysis MetABEL Meta-analysis of genome-wide SNP association results A package for meta-analysis of genome-wide association scans between quantitative or binary traits and SNPs
1689 Meta-Analysis metaBMA Bayesian Model Averaging for Random and Fixed Effects Meta-Analysis Computes the posterior model probabilities for four meta-analysis models (null model vs. alternative model assuming either fixed- or random-effects, respectively). These posterior probabilities are used to estimate the overall mean effect size as the weighted average of the mean effect size estimates of the random- and fixed-effect model as proposed by Gronau, Van Erp, Heck, Cesario, Jonas, & Wagenmakers (2017, <doi:10.1080/23743603.2017.1326760>). The user can define a wide range of noninformative or informative priors for the mean effect size and the heterogeneity coefficient. Funding for this research was provided by the Berkeley Initiative for Transparency in the Social Sciences, a program of the Center for Effective Global Action (CEGA), with support from the Laura and John Arnold Foundation.
1690 Meta-Analysis metacor Meta-analysis of correlation coefficients Implement the DerSimonian-Laird (DSL) and Olkin-Pratt (OP) meta-analytical approaches with correlation coefficients as effect sizes.
1691 Meta-Analysis metafor (core) Meta-Analysis Package for R A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L’Abbe, Baujat, GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto’s method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted.
1692 Meta-Analysis metaforest Exploring Heterogeneity in Meta-Analysis using Random Forests Conduct random forests-based meta-analysis, obtain partial dependence plots for metaforest and classic meta-analyses, and cross-validate and tune metaforest- and classic meta-analyses in conjunction with the caret package. A requirement of classic meta-analysis is that the studies being aggregated are conceptually similar, and ideally, close replications. However, in many fields, there is substantial heterogeneity between studies on the same topic. Classic meta-analysis lacks the power to assess more than a handful of univariate moderators. MetaForest, by contrast, has substantial power to explore heterogeneity in meta-analysis. It can identify important moderators from a larger set of potential candidates, even with as little as 20 studies (Van Lissa, in preparation). This is an appealing quality, because many meta-analyses have small sample sizes. Moreover, MetaForest yields a measure of variable importance which can be used to identify important moderators, and offers partial prediction plots to explore the shape of the marginal relationship between moderators and effect size.
1693 Meta-Analysis metafuse Fused Lasso Approach in Regression Coefficient Clustering Fused lasso method to cluster and estimate regression coefficients of the same covariate across different data sets when a large number of independent data sets are combined. Package supports Gaussian, binomial, Poisson and Cox PH models.
1694 Meta-Analysis metagear Comprehensive Research Synthesis Tools for Systematic Reviews and Meta-Analysis Functionalities for facilitating systematic reviews, data extractions, and meta-analyses. It includes a GUI (graphical user interface) to help screen the abstracts and titles of bibliographic data; tools to assign screening effort across multiple collaborators/reviewers and to assess inter- reviewer reliability; tools to help automate the download and retrieval of journal PDF articles from online databases; figure and image extractions from PDFs; web scraping of citations; automated and manual data extraction from scatter-plot and bar-plot images; PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagrams; simple imputation tools to fill gaps in incomplete or missing study parameters; generation of random effects sizes for Hedges’ d, log response ratio, odds ratio, and correlation coefficients for Monte Carlo experiments; covariance equations for modelling dependencies among multiple effect sizes (e.g., effect sizes with a common control); and finally summaries that replicate analyses and outputs from widely used but no longer updated meta-analysis software. Funding for this package was supported by National Science Foundation (NSF) grants DBI-1262545 and DEB-1451031.
1695 Meta-Analysis metagen Inference in Meta Analysis and Meta Regression Provides methods for making inference in the random effects meta regression model such as point estimates and confidence intervals for the heterogeneity parameter and the regression coefficients vector. Inference methods are based on different approaches to statistical inference. Methods from three different schools are included: methods based on the method of moments approach, methods based on likelihood, and methods based on generalised inference. The package also includes tools to run extensive simulation studies in parallel on high performance clusters in a modular way. This allows extensive testing of custom inferential methods with all implemented state-of-the-art methods in a standardised way. Tools for evaluating the performance of both point and interval estimates are provided. Also, a large collection of different pre-defined plotting functions is implemented in a ready-to-use fashion.
1696 Meta-Analysis metagen Inference in Meta Analysis and Meta Regression Provides methods for making inference in the random effects meta regression model such as point estimates and confidence intervals for the heterogeneity parameter and the regression coefficients vector. Inference methods are based on different approaches to statistical inference. Methods from three different schools are included: methods based on the method of moments approach, methods based on likelihood, and methods based on generalised inference. The package also includes tools to run extensive simulation studies in parallel on high performance clusters in a modular way. This allows extensive testing of custom inferential methods with all implemented state-of-the-art methods in a standardised way. Tools for evaluating the performance of both point and interval estimates are provided. Also, a large collection of different pre-defined plotting functions is implemented in a ready-to-use fashion.
1697 Meta-Analysis MetaIntegrator Meta-Analysis of Gene Expression Data A pipeline for the meta-analysis of gene expression data. We have assembled several analysis and plot functions to perform integrated multi-cohort analysis of gene expression data (meta- analysis). Methodology described in: <http://biorxiv.org/content/early/2016/08/25/071514>.
1698 Meta-Analysis metaLik Likelihood Inference in Meta-Analysis and Meta-Regression Models First- and higher-order likelihood inference in meta-analysis and meta-regression models.
1699 Meta-Analysis metaMA Meta-analysis for MicroArrays Combines either p-values or modified effect sizes from different studies to find differentially expressed genes
1700 Meta-Analysis metamedian Meta-Analysis of Medians Implements several methods to meta-analyze studies that report the sample median of the outcome. When the primary studies are one-group studies, the methods of McGrath et al. (2019) <doi:10.1002/sim.8013> can be applied to estimate the pooled median. In the two-group context, the methods of McGrath et al. (2018) <arXiv:1809.01278> can be applied to estimate the pooled raw difference of medians across groups.
1701 Meta-Analysis metamisc Diagnostic and Prognostic Meta-Analysis Meta-analysis of diagnostic and prognostic modeling studies. Summarize estimates of prognostic factors, diagnostic test accuracy and prediction model performance. Validate, update and combine published prediction models. Develop new prediction models with data from multiple studies.
1702 Meta-Analysis metansue Meta-Analysis of Studies with Non-Statistically Significant Unreported Effects Novel method to unbiasedly include studies with Non-statistically Significant Unreported Effects (NSUEs) in a meta-analysis <doi:10.1001/jamapsychiatry.2015.2196> and <doi:10.1177/0962280218811349>. Briefly, the method first calculates the interval where the unreported effects (e.g. t-values) should be according to the threshold of statistical significance used in each study. Afterwards, maximum likelihood techniques are used to impute the expected effect size of each study with NSUEs, accounting for between-study heterogeneity and potential covariates. Multiple imputations of the NSUEs are then randomly created based on the expected value, variance and statistical significance bounds. Finally, a restricted-maximum likelihood random-effects meta-analysis is separately conducted for each set of imputations, and estimations from these meta-analyses are pooled. Please read the reference in ‘metansue’ for details of the procedure.
1703 Meta-Analysis metap Meta-Analysis of Significance Values The canonical way to perform meta-analysis involves using effect sizes. When they are not available this package provides a number of methods for meta-analysis of significance values including the methods of Edgington, Fisher, Lancaster, Stouffer, Tippett, and Wilkinson; a number of data-sets to replicate published results; and a routine for graphical display.
1704 Meta-Analysis MetaPath Perform the Meta-Analysis for Pathway Enrichment Analysis (MAPE) Perform the Meta-analysis for Pathway Enrichment (MAPE) methods introduced by Shen and Tseng (2010). It includes functions to automatically perform MAPE_G (integrating multiple studies at gene level), MAPE_P (integrating multiple studies at pathway level) and MAPE_I (a hybrid method integrating MAEP_G and MAPE_P methods). In the simulation and real data analyses in the paper, MAPE_G and MAPE_P have complementary advantages and detection power depending on the data structure. In general, the integrative form of MAPE_I is recommended to use. In the case that MAPE_G (or MAPE_P) detects almost none pathway, the integrative MAPE_I does not improve performance and MAPE_P (or MAPE_G) should be used. Reference: Shen, Kui, and George C Tseng. Meta-analysis for pathway enrichment analysis when combining multiple microarray studies.Bioinformatics (Oxford, England) 26, no. 10 (April 2010): 1316-1323. doi:10.1093/bioinformatics/btq148. http://www.ncbi.nlm.nih.gov/pubmed/20410053.
1705 Meta-Analysis MetaPCA MetaPCA: Meta-analysis in the Dimension Reduction of Genomic data MetaPCA implements simultaneous dimension reduction using PCA when multiple studies are combined. We propose two basic ideas to find a common PC subspace by eigenvalue maximization approach and angle minimization approach, and we extend the concept to incorporate Robust PCA and Sparse PCA in the meta-analysis realm.
1706 Meta-Analysis metaplotr Creates CrossHairs Plots for Meta-Analyses Creates crosshairs plots to summarize and analyse meta-analysis results. In due time this package will contain code that will create other kind of meta-analysis graphs.
1707 Meta-Analysis metaplus Robust Meta-Analysis and Meta-Regression Performs meta-analysis and meta-regression using standard and robust methods with confidence intervals based on the profile likelihood. Robust methods are based on alternative distributions for the random effect, either the t-distribution (Lee and Thompson, 2008 <doi:10.1002/sim.2897> or Baker and Jackson, 2008 <doi:10.1007/s10729-007-9041-8>) or mixtures of normals (Beath, 2014 <doi:10.1002/jrsm.1114>).
1708 Meta-Analysis metaRNASeq Meta-analysis of RNA-seq data Implementation of two p-value combination techniques (inverse normal and Fisher methods). A vignette is provided to explain how to perform a meta-analysis from two independent RNA-seq experiments.
1709 Meta-Analysis metaSEM Meta-Analysis using Structural Equation Modeling A collection of functions for conducting meta-analysis using a structural equation modeling (SEM) approach via the ‘OpenMx’ and ‘lavaan’ packages. It also implements various procedures to perform meta-analytic structural equation modeling on the correlation and covariance matrices.
1710 Meta-Analysis metasens Advanced Statistical Methods to Model and Adjust for Bias in Meta-Analysis The following methods are implemented to evaluate how sensitive the results of a meta-analysis are to potential bias in meta-analysis and to support Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 5 “Small-Study Effects in Meta-Analysis”: - Copas selection model described in Copas & Shi (2001) <doi:10.1177/096228020101000402>; - limit meta-analysis by Rucker et al. (2011) <doi:10.1093/biostatistics/kxq046>; - upper bound for outcome reporting bias by Copas & Jackson (2004) <doi:10.1111/j.0006-341X.2004.00161.x>.
1711 Meta-Analysis MetaSKAT Meta Analysis for SNP-Set (Sequence) Kernel Association Test Functions for Meta-analysis Burden test, SKAT and SKAT-O by Lee et al. (2013) <doi:10.1016/j.ajhg.2013.05.010>. These methods use summary-level score statistics to carry out gene-based meta-analysis for rare variants.
1712 Meta-Analysis MetaStan Bayesian Meta-Analysis via ‘Stan’ Performs Bayesian meta-analysis using ‘Stan’. Includes binomial-normal hierarchical models and option to use weakly informative priors for the heterogeneity parameter and the treatment effect parameter which are described in Guenhan, Roever, and Friede (2018) <arXiv:1809.04407>.
1713 Meta-Analysis MetaSubtract Subtracting Summary Statistics of One or more Cohorts from Meta-GWAS Results If results from a meta-GWAS are used for validation in one of the cohorts that was included in the meta-analysis, this will yield biased (i.e. too optimistic) results. The validation cohort needs to be independent from the meta-Genome-Wide-Association-Study (meta-GWAS) results. ‘MetaSubtract’ will subtract the results of the respective cohort from the meta-GWAS results analytically without having to redo the meta-GWAS analysis using the leave-one-out methodology. It can handle different meta-analyses methods and takes into account if single or double genomic control correction was applied to the original meta-analysis. It can also handle different meta-analysis methods. It can be used for whole GWAS, but also for a limited set of genetic markers.
1714 Meta-Analysis metatest Fit and Test Metaregression Models Fits and tests meta regression models and generates a number of useful test statistics: next to t- and z-tests, the likelihood ratio, bartlett corrected likelihood ratio and permutation tests are performed on the model coefficients.
1715 Meta-Analysis Metatron Meta-analysis for Classification Data and Correction to Imperfect Reference This package allows doing meta-analysis for primary studies with classification outcomes in order to evaluate systematically the accuracies of classifiers, namely, the diagnostic tests. It provides functions to fit the bivariate model of Reitsma et al.(2005). Moreover, if the reference employed in the classification process isn’t a gold standard, its deficit can be detected and its influence to the underestimation of the diagnostic test’s accuracy can be corrected, as described in Botella et al.(2013).
1716 Meta-Analysis metavcov Variance-Covariance Matrix for Multivariate Meta-Analysis Compute variance-covariance matrix for multivariate meta-analysis. Effect sizes include correlation (r), mean difference (MD), standardized mean difference (SMD), log odds ratio (logOR), log risk ratio (logRR), and risk difference (RD).
1717 Meta-Analysis metaviz Forest Plots, Funnel Plots, and Visual Funnel Plot Inference for Meta-Analysis A compilation of functions to create visually appealing and information-rich plots of meta-analytic data using ‘ggplot2’. Currently allows to create forest plots, funnel plots, and many of their variants, such as rainforest plots, thick forest plots, additional evidence contour funnel plots, and sunset funnel plots. In addition, functionalities for visual inference with the funnel plot in the context of meta-analysis are provided.
1718 Meta-Analysis mmeta Multivariate Meta-Analysis A novel multivariate meta-analysis.
1719 Meta-Analysis MultiMeta Meta-analysis of Multivariate Genome Wide Association Studies Allows running a meta-analysis of multivariate Genome Wide Association Studies (GWAS) and easily visualizing results through custom plotting functions. The multivariate setting implies that results for each single nucleotide polymorphism (SNP) include several effect sizes (also known as “beta coefficients”, one for each trait), as well as related variance values, but also covariance between the betas. The main goal of the package is to provide combined beta coefficients across different cohorts, together with the combined variance/covariance matrix. The method is inverse-variance based, thus each beta is weighted by the inverse of its variance-covariance matrix, before taking the average across all betas. The default options of the main function will work with files obtained from GEMMA multivariate option for GWAS (Zhou & Stephens, 2014). It will work with any other output, as soon as columns are formatted to have the according names. The package also provides several plotting functions for QQ-plots, Manhattan Plots and custom summary plots.
1720 Meta-Analysis mvmeta Multivariate and Univariate Meta-Analysis and Meta-Regression Collection of functions to perform fixed and random-effects multivariate and univariate meta-analysis and meta-regression.
1721 Meta-Analysis mvtmeta Multivariate meta-analysis This package contains functions to run fixed effects or random effects multivariate meta-analysis.
1722 Meta-Analysis netmeta Network Meta-Analysis using Frequentist Methods A comprehensive set of functions providing frequentist methods for network meta-analysis and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 “Network Meta-Analysis”: - frequentist network meta-analysis following Rucker (2012) <doi:10.1002/jrsm.1058>; - net heat plot and design-based decomposition of Cochran’s Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by Konig et al. (2013) <doi:10.1002/sim.6001>; - ranking of treatments (frequentist analogue of SUCRA) according to Rucker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - partial order of treatment rankings (‘poset’) and Hasse diagram for ‘poset’ (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rucker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>; - league table with network meta-analysis results; - additive network meta-analysis for combinations of treatments; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method; - ‘comparison-adjusted’ funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - automated drawing of network graphs described in Rucker & Schwarzer (2016) <doi:10.1002/jrsm.1143>.
1723 Meta-Analysis nmaINLA Network Meta-Analysis using Integrated Nested Laplace Approximations Performs network meta-analysis using integrated nested Laplace approximations (‘INLA’). Includes methods to assess the heterogeneity and inconsistency in the network. Contains more than ten different network meta-analysis data. ‘INLA’ package can be obtained from <http://www.r-inla.org>. We recommend the testing version.
1724 Meta-Analysis nmathresh Thresholds and Invariant Intervals for Network Meta-Analysis Calculation and presentation of decision-invariant bias adjustment thresholds and intervals for Network Meta-Analysis, as described by Phillippo et al. (2018) <doi:10.1111/rssa.12341>. These describe the smallest changes to the data that would result in a change of decision.
1725 Meta-Analysis ofGEM A Meta-Analysis Approach with Filtering for Identifying Gene-Level Gene-Environment Interactions with Genetic Association Data Offers a gene-based meta-analysis test with filtering to detect gene-environment interactions (GxE) with association data, proposed by Wang et al. (2018) <doi:10.1002/gepi.22115>. It first conducts a meta-filtering test to filter out unpromising SNPs by combining all samples in the consortia data. It then runs a test of omnibus-filtering-based GxE meta-analysis (ofGEM) that combines the strengths of the fixed- and random-effects meta-analysis with meta-filtering. It can also analyze data from multiple ethnic groups.
1726 Meta-Analysis pcnetmeta Patient-Centered Network Meta-Analysis Performs arm-based network meta-analysis for datasets with binary, continuous, and count outcomes using the Bayesian methods of Zhang et al (2014) <doi:10.1177/1740774513498322> and Lin et al (2017) <doi:10.18637/jss.v080.i05>.
1727 Meta-Analysis pimeta Prediction Intervals for Random-Effects Meta-Analysis An implementation of prediction intervals for random-effects meta-analysis: Higgins et al. (2009) <doi:10.1111/j.1467-985X.2008.00552.x>, Partlett and Riley (2017) <doi:10.1002/sim.7140>, and Nagashima et al. (2018) <doi:10.1177/0962280218773520>, <arXiv:1804.01054>.
1728 Meta-Analysis psychmeta Psychometric Meta-Analysis Toolkit Tools for computing bare-bones and psychometric meta-analyses and for generating psychometric data for use in meta-analysis simulations. Supports bare-bones, individual-correction, and artifact-distribution methods for meta-analyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more. Bugs can be reported to <https://github.com/psychmeta/psychmeta/issues> or <issues@psychmeta.com>;.
1729 Meta-Analysis psychometric Applied Psychometric Theory Contains functions useful for correlation theory, meta-analysis (validity-generalization), reliability, item analysis, inter-rater reliability, and classical utility
1730 Meta-Analysis PubBias Performs simulation study to look for publication bias, using a technique described by Ioannidis and Trikalinos; Clin Trials. 2007;4(3):245-53 I adapted a method designed by Ioannidis and Trikalinos, which compares the observed number of positive studies in a meta-analysis with the expected number, if the summary measure of effect, averaged over the individual studies, were assumed true. Excess in the observed number of positive studies, compared to the expected, is taken as evidence of publication bias. The observed number of positive studies, at a given level for statistical significance, is calculated by applying Fisher’s exact test to the reported 2x2 table data of each constituent study, doubling the Fisher one-sided P-value to make a two-sided test. The corresponding expected number of positive studies was obtained by summing the statistical powers of each study. The statistical power depended on a given measure of effect which, here, was the pooled odds ratio of the meta-analysis was used. By simulating each constituent study, with the given odds ratio, and the same number of treated and non-treated as in the real study, the power of the study is estimated as the proportion of simulated studies that are positive, again by a Fisher’s exact test. The simulated number of events in the treated and untreated groups was done with binomial sampling. In the untreated group, the binomial proportion was the percentage of actual events reported in the study and, in the treated group, the binomial sampling proportion was the untreated percentage multiplied by the risk ratio which was derived from the assumed common odds ratio. The statistical significance for judging a positive study may be varied and large differences between expected and observed number of positive studies around the level of 0.05 significance constitutes evidence of publication bias. The difference between the observed and expected is tested by chi-square. A chi-square test P-value for the difference below 0.05 is suggestive of publication bias, however, a less stringent level of 0.1 is often used in studies of publication bias as the number of published studies is usually small.
1731 Meta-Analysis puniform Meta-Analysis Methods Correcting for Publication Bias Provides meta-analysis methods that correct for publication bias. Four methods are currently included in the package. The p-uniform method as described in van Assen, van Aert, and Wicherts (2015) <doi:10.1037/met0000025> can be used for estimating the average effect size, testing the null hypothesis of no effect, and testing for publication bias using only the statistically significant effect sizes of primary studies. The second method in the package is the p-uniform* method as described in Chapter 5 of van Aert (2018) <doi:10.31222/osf.io/eqhjd>. This method is an extension of the p-uniform method that allows for estimation of the average effect size and the between-study variance in a meta-analysis, and uses both the statistically significant and nonsignificant effect sizes. The third method in the package is the hybrid method as described in van Aert and van Assen (2017) <doi:10.3758/s13428-017-0967-6>. The hybrid method is a meta-analysis method for combining an original study and replication and while taking into account statistical significance of the original study. The p-uniform and hybrid method are based on the statistical theory that the distribution of p-values is uniform conditional on the population effect size. The fourth method in the package is the Snapshot Bayesian Hybrid Meta-Analysis Method as described in van Aert and van Assen (2017) <doi:10.1371/journal.pone.0175302>. This method computes posterior probabilities for four true effect sizes (no, small, medium, and large) based on an original study and replication while taking into account publication bias in the original study. The method can also be used for computing the required sample size of the replication akin to power analysis in null hypothesis significance testing.
1732 Meta-Analysis RandMeta Efficient Numerical Algorithm for Exact Inference in Meta Analysis A novel numerical algorithm that provides functionality for estimating the exact 95% confidence interval of the location parameter in the random effects model, and is much faster than the naive method. Works best when the number of studies is between 6-20.
1733 Meta-Analysis ratesci Confidence Intervals for Comparisons of Binomial or Poisson Rates Computes confidence intervals for the rate (or risk) difference (‘RD’) or rate ratio (or relative risk, ‘RR’) for binomial proportions or Poisson rates, or for odds ratio (‘OR’, binomial only). Also confidence intervals for a single binomial or Poisson rate, and intervals for matched pairs. Includes skewness-corrected asymptotic score (‘SCAS’) methods, which have been developed in Laud (2017) <doi:10.1002/pst.1813> from Miettinen & Nurminen (1985) <doi:10.1002/sim.4780040211> and Gart & Nam (1988) <doi:10.2307/2531848>. Also includes MOVER methods (Method Of Variance Estimates Recovery) for all contrasts, derived from the Newcombe method but using equal-tailed Jeffreys intervals, and generalised for Bayesian applications incorporating prior information. So-called ‘exact’ methods for strictly conservative coverage are approximated using continuity corrections. Also includes methods for stratified calculations (e.g. meta-analysis), either assuming fixed effects or incorporating stratum heterogeneity.
1734 Meta-Analysis RcmdrPlugin.EZR R Commander Plug-in for the EZR (Easy R) Package EZR (Easy R) adds a variety of statistical functions, including survival analyses, ROC analyses, metaanalyses, sample size calculation, and so on, to the R commander. EZR enables point-and-click easy access to statistical functions, especially for medical statistics. EZR is platform-independent and runs on Windows, Mac OS X, and UNIX. Its complete manual is available only in Japanese (Chugai Igakusha, ISBN: 978-4-498-10901-8 or Nankodo, ISBN: 978-4-524-26158-1), but an report that introduced the investigation of EZR was published in Bone Marrow Transplantation (Nature Publishing Group) as an Open article. This report can be used as a simple manual. It can be freely downloaded from the journal website as shown below. This report has been cited in more than 2,000 scientific articles.
1735 Meta-Analysis RcmdrPlugin.RMTCJags R MTC Jags ‘Rcmdr’ Plugin Mixed Treatment Comparison is a methodology to compare directly and/or indirectly health strategies (drugs, treatments, devices). This package provides an ‘Rcmdr’ plugin to perform Mixed Treatment Comparison for binary outcome using BUGS code from Bristol University (Lu and Ades).
1736 Meta-Analysis revtools Tools to Support Evidence Synthesis Researchers commonly need to summarize scientific information, a process known as ‘evidence synthesis’. The first stage of a synthesis process (such as a systematic review or meta-analysis) is to download a list of references from academic search engines such as ‘Web of Knowledge’ or ‘Scopus’. The traditional approach to systematic review is then to sort these data manually, first by locating and removing duplicated entries, and then screening to remove irrelevant content by viewing titles and abstracts (in that order). ‘revtools’ provides interfaces for each of these tasks. An alternative approach, however, is to draw on tools from machine learning to visualise patterns in the corpus. In this case, you can use ‘revtools’ to render ordinations of text drawn from article titles, keywords and abstracts, and interactively select or exclude individual references, words or topics.
1737 Meta-Analysis rma.exact Exact Confidence Intervals for Random Effects Meta-Analyses Compute an exact CI for the population mean under a random effects model. The routines implement the algorithm described in Michael, Thronton, Xie, and Tian (2017) <https://haben-michael.github.io/research/Exact_Inference_Meta.pdf>.
1738 Meta-Analysis rmeta Meta-Analysis Functions for simple fixed and random effects meta-analysis for two-sample comparisons and cumulative meta-analyses. Draws standard summary plots, funnel plots, and computes summaries and tests for association and heterogeneity.
1739 Meta-Analysis robumeta Robust Variance Meta-Regression Functions for conducting robust variance estimation (RVE) meta-regression using both large and small sample RVE estimators under various weighting schemes. These methods are distribution free and provide valid point estimates, standard errors and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown. Also included are functions for conducting sensitivity analyses under correlated effects weighting and producing RVE-based forest plots.
1740 Meta-Analysis SAMURAI Sensitivity Analysis of a Meta-analysis with Unpublished but Registered Analytical Investigations This package contains R functions to gauge the impact of unpublished studies upon the meta-analytic summary effect of a set of published studies. (Credits: The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 282574.)
1741 Meta-Analysis SCMA Single-Case Meta-Analysis Perform meta-analysis of single-case experiments, including calculating various effect size measures (SMD, PND, PEM and NAP) and probability combining (additive and multiplicative method).
1742 Meta-Analysis selectMeta Estimation of Weight Functions in Meta Analysis Publication bias, the fact that studies identified for inclusion in a meta analysis do not represent all studies on the topic of interest, is commonly recognized as a threat to the validity of the results of a meta analysis. One way to explicitly model publication bias is via selection models or weighted probability distributions. In this package we provide implementations of several parametric and nonparametric weight functions. The novelty in Rufibach (2011) is the proposal of a non-increasing variant of the nonparametric weight function of Dear & Begg (1992). The new approach potentially offers more insight in the selection process than other methods, but is more flexible than parametric approaches. To maximize the log-likelihood function proposed by Dear & Begg (1992) under a monotonicity constraint we use a differential evolution algorithm proposed by Ardia et al (2010a, b) and implemented in Mullen et al (2009). In addition, we offer a method to compute a confidence interval for the overall effect size theta, adjusted for selection bias as well as a function that computes the simulation-based p-value to assess the null hypothesis of no selection as described in Rufibach (2011, Section 6).
1743 Meta-Analysis seqMeta Meta-Analysis of Region-Based Tests of Rare DNA Variants Computes necessary information to meta analyze region-based tests for rare genetic variants (e.g. SKAT, T1) in individual studies, and performs meta analysis.
1744 Meta-Analysis TFisher Optimal Thresholding Fisher’s P-Value Combination Method We provide the cumulative distribution function (CDF), quantile, and statistical power calculator for a collection of thresholding Fisher’s p-value combination methods, including Fisher’s p-value combination method, truncated product method and, in particular, soft-thresholding Fisher’s p-value combination method which is proven to be optimal in some context of signal detection. The p-value calculator for the omnibus version of these tests are also included. For reference, please see Hong Zhang and Zheyang Wu. “TFisher Tests: Optimal and Adaptive Thresholding for Combining p-Values”, submitted.
1745 Meta-Analysis weightr Estimating Weight-Function Models for Publication Bias Estimates the Vevea and Hedges (1995) weight-function model. By specifying arguments, users can also estimate the modified model described in Vevea and Woods (2005), which may be more practical with small datasets. Users can also specify moderators to estimate a linear model. The package functionality allows users to easily extract the results of these analyses as R objects for other uses. In addition, the package includes a function to launch both models as a Shiny application. Although the Shiny application is also available online, this function allows users to launch it locally if they choose.
1746 Meta-Analysis xmeta A Toolbox for Multivariate Meta-Analysis A toolbox for meta-analysis. This package includes a collection of functions for (1) implementing robust multivariate meta-analysis of continuous or binary outcomes; and (2) a bivariate Egger’s test for detecting publication bias.
1747 Missing Data accelmissing Missing Value Imputation for Accelerometer Data Imputation for the missing count values in accelerometer data. The methodology includes both parametric and semi-parametric multiple imputations under the zero-inflated Poisson lognormal model. This package also provides multiple functions to pre-process the accelerometer data previous to the missing data imputation. These includes detecting wearing and non-wearing time, selecting valid days and subjects, and creating plots.
1748 Missing Data ade4 Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
1749 Missing Data alleHap Allele Imputation and Haplotype Reconstruction from Pedigree Databases Tools to simulate alphanumeric alleles, impute genetic missing data and reconstruct non-recombinant haplotypes from pedigree databases in a deterministic way. Allelic simulations can be implemented taking into account many factors (such as number of families, markers, alleles per marker, probability and proportion of missing genotypes, recombination rate, etc). Genotype imputation can be used with simulated datasets or real databases (previously loaded in .ped format). Haplotype reconstruction can be carried out even with missing data, since the program firstly imputes each family genotype (without a reference panel), to later reconstruct the corresponding haplotypes for each family member. All this considering that each individual (due to meiosis) should unequivocally have two alleles per marker (one inherited from each parent) and thus imputation and reconstruction results can be deterministically calculated.
1750 Missing Data Amelia (core) A Program for Missing Data A tool that “multiply imputes” missing data in a single cross-section (such as a survey), from a time series (like variables collected for each year in a country), or from a time-series-cross-sectional data set (such as collected by years for each of several countries). Amelia II implements our bootstrapping-based algorithm that gives essentially the same answers as the standard IP or EMis approaches, is usually considerably faster than existing approaches and can handle many more variables. Unlike Amelia I and other statistically rigorous imputation software, it virtually never crashes (but please let us know if you find to the contrary!). The program also generalizes existing approaches by allowing for trends in time series across observations within a cross-sectional unit, as well as priors that allow experts to incorporate beliefs they have about the values of missing cells in their data. Amelia II also includes useful diagnostics of the fit of multiple imputation models. The program works from the R command line or via a graphical user interface that does not require users to know R.
1751 Missing Data BaBooN Bayesian Bootstrap Predictive Mean Matching - Multiple and Single Imputation for Discrete Data Included are two variants of Bayesian Bootstrap Predictive Mean Matching to multiply impute missing data. The first variant is a variable-by-variable imputation combining sequential regression and Predictive Mean Matching (PMM) that has been extended for unordered categorical data. The Bayesian Bootstrap allows for generating approximately proper multiple imputations. The second variant is also based on PMM, but the focus is on imputing several variables at the same time. The suggestion is to use this variant, if the missing-data pattern resembles a data fusion situation, or any other missing-by-design pattern, where several variables have identical missing-data patterns. Both variants can be run as ‘single imputation’ versions, in case the analysis objective is of a purely descriptive nature.
1752 Missing Data BaylorEdPsych R Package for Baylor University Educational Psychology Quantitative Courses Functions and data used for Baylor University Educational Psychology Quantitative Courses
1753 Missing Data brlrmr Bias Reduction with Missing Binary Response Provides two main functions, il() and fil(). The il() function implements the EM algorithm developed by Ibrahim and Lipsitz (1996) <doi:10.2307/2533068> to estimate the parameters of a logistic regression model with the missing response when the missing data mechanism is nonignorable. The fil() function implements the algorithm proposed by Maity et. al. (2017+) <https://github.com/arnabkrmaity/brlrmr> to reduce the bias produced by the method of Ibrahim and Lipsitz (1996) <doi:10.2307/2533068>.
1754 Missing Data CALIBERrfimpute Multiple Imputation Using MICE and Random Forest Functions to impute using Random Forest under Full Conditional Specifications (Multivariate Imputation by Chained Equations). The CALIBER programme is funded by the Wellcome Trust (086091/Z/08/Z) and the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research programme (RP-PG-0407-10314). The author is supported by a Wellcome Trust Clinical Research Training Fellowship (0938/30/Z/10/Z).
1755 Missing Data cat Analysis of categorical-variable datasets with missing values Analysis of categorical-variable with missing values
1756 Missing Data cdparcoord Top Frequency-Based Parallel Coordinates Parallel coordinate plotting with resolutions for large data sets and missing values.
1757 Missing Data CMF Collective matrix factorization Collective matrix factorization (CMF) finds joint low-rank representations for a collection of matrices with shared row or column entities. This code learns variational Bayesian approximation for CMF, supporting multiple likelihood potentials and missing data, while identifying both factors shared by multiple matrices and factors private for each matrix.
1758 Missing Data cobalt Covariate Balance Tables and Plots Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with ‘MatchIt’, ‘twang’, ‘Matching’, ‘optmatch’, ‘CBPS’, ‘ebal’, ‘WeightIt’, and ‘designmatch’ for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with longitudinal treatments.
1759 Missing Data CoImp Copula Based Imputation Method Copula based imputation method. A semiparametric imputation procedure for missing multivariate data based on conditional copula specifications.
1760 Missing Data CRTgeeDR Doubly Robust Inverse Probability Weighted Augmented GEE Estimator Implements a semi-parametric GEE estimator accounting for missing data with Inverse-probability weighting (IPW) and for imbalance in covariates with augmentation (AUG). The estimator IPW-AUG-GEE is Doubly robust (DR).
1761 Missing Data cutoffR CUTOFF: A Spatio-temporal Imputation Method This package provides a set of tools for spatio-temporal imputation in R. It includes the implementation for then CUTOFF imputation method, a useful cross-validation function that can be used not only by the CUOTFF method but also by some other imputation functions to help choosing an optimal value for relevant parameters, such as the number of k-nearest neighbors for the KNN imputation method, or the number of components for the SVD imputation method. It also contains tools for simulating data with missing values with respect to some specific missing pattern, for example, block missing. Some useful visualisation functions for imputation purposes are also provided in the package.
1762 Missing Data CVThresh Level-Dependent Cross-Validation Thresholding This package carries out level-dependent cross-validation method for the selection of thresholding value in wavelet shrinkage. This procedure is implemented by coupling a conventional cross validation with an imputation method due to a limitation of data length, a power of 2. It can be easily applied to classical leave-one-out and k-fold cross validation. Since the procedure is computationally fast, a level-dependent cross validation can be performed for wavelet shrinkage of various data such as a data with correlated errors.
1763 Missing Data ddsPLS Data-Driven Sparse PLS Robust to Missing Samples for Mono and Multi-Block Data Sets Allows to build Multi-Data-Driven Sparse PLS models. Multi-blocks with high-dimensional settings are particularly sensible to this.
1764 Missing Data denoiseR Regularized Low Rank Matrix Estimation Estimate a low rank matrix from noisy data using singular values thresholding and shrinking functions. Impute missing values with matrix completion.
1765 Missing Data DescTools Tools for Descriptive Statistics A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author’s intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The ‘camel style’ was consequently applied to functions borrowed from contributed R packages as well.
1766 Missing Data DiffusionRimp Inference and Analysis for Diffusion Processes via Data Imputation and Method of Lines Tools for performing inference and analysis using a data-imputation scheme and the method of lines.
1767 Missing Data dils Data-Informed Link Strength. Combine multiple-relationship networks into a single weighted network. Impute (fill-in) missing network links Combine multiple-relationship networks into a single weighted network. The approach is similar to factor analysis in the that contribution from each constituent network varies so as to maximize the information gleaned from the multiple-relationship networks. This implementation uses Principal Component Analysis calculated using ‘prcomp’ with bootstrap subsampling. Missing links are imputed using the method of Chen et al. (2012).
1768 Missing Data dlookr Tools for Data Diagnosis, Exploration, Transformation A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values and outliers and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and relationship between target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputates missing values and outliers, resolving skewness. And it creates automated reports that support these three tasks.
1769 Missing Data DMwR Functions and data for “Data Mining with R” This package includes functions and data accompanying the book “Data Mining with R, learning with case studies” by Luis Torgo, CRC Press 2010.
1770 Missing Data DrImpute Imputing Dropout Events in Single-Cell RNA-Sequencing Data R codes for imputing dropout events. Many statistical methods in cell type identification, visualization and lineage reconstruction do not account for dropout events (‘PCAreduce’, ‘SC3’, ‘PCA’, ‘t-SNE’, ‘Monocle’, ‘TSCAN’, etc). ‘DrImpute’ can improve the performance of such software by imputing dropout events.
1771 Missing Data DTWBI Imputation of Time Series Based on Dynamic Time Warping Functions to impute large gaps within time series based on Dynamic Time Warping methods. It contains all required functions to create large missing consecutive values within time series and to fill them, according to the paper Phan et al. (2017), <doi:10.1016/j.patrec.2017.08.019>. Performance criteria are added to compare similarity between two signals (query and reference).
1772 Missing Data DTWUMI Imputation of Multivariate Time Series Based on Dynamic Time Warping Functions to impute large gaps within multivariate time series based on Dynamic Time Warping methods. Gaps of size 1 or inferior to a defined threshold are filled using simple average and weighted moving average respectively. Larger gaps are filled using the methodology provided by Phan et al. (2017) <doi:10.1109/MLSP.2017.8168165>: a query is built immediately before/after a gap and a moving window is used to find the most similar sequence to this query using Dynamic Time Warping. To lower the calculation time, similar sequences are pre-selected using global features. Contrary to the univariate method (package ‘DTWBI’), these global features are not estimated over the sequence containing the gap(s), but a feature matrix is built to summarize general features of the whole multivariate signal. Once the most similar sequence to the query has been identified, the adjacent sequence to this window is used to fill the gap considered. This function can deal with multiple gaps over all the sequences componing the input multivariate signal. However, for better consistency, large gaps at the same location over all sequences should be avoided.
1773 Missing Data eigenmodel Semiparametric Factor and Regression Models for Symmetric Relational Data Estimation of the parameters in a model for symmetric relational data (e.g., the above-diagonal part of a square matrix), using a model-based eigenvalue decomposition and regression. Missing data is accommodated, and a posterior mean for missing data is calculated under the assumption that the data are missing at random. The marginal distribution of the relational data can be arbitrary, and is fit with an ordered probit specification. See Hoff (2007) <arXiv:0711.1146> for details on the model.
1774 Missing Data experiment R Package for Designing and Analyzing Randomized Experiments Provides various statistical methods for designing and analyzing randomized experiments. One functionality of the package is the implementation of randomized-block and matched-pair designs based on possibly multivariate pre-treatment covariates. The package also provides the tools to analyze various randomized experiments including cluster randomized experiments, two-stage randomized experiments, randomized experiments with noncompliance, and randomized experiments with missing data.
1775 Missing Data FamEvent Family Age-at-Onset Data Simulation and Penetrance Estimation Simulates age-at-onset traits associated with a segregating major gene in family data obtained from population-based, clinic-based, or multi-stage designs. Appropriate ascertainment correction is utilized to estimate age-dependent penetrance functions either parametrically from the fitted model or nonparametrically from the data. The Expectation and Maximization algorithm can infer missing genotypes and carrier probabilities estimated from family’s genotype and phenotype information or from a fitted model. Plot functions include pedigrees of simulated families and predicted penetrance curves based on specified parameter values.
1776 Missing Data fastLink Fast Probabilistic Record Linkage with Missing Data Implements a Fellegi-Sunter probabilistic record linkage model that allows for missing data and the inclusion of auxiliary information. This includes functionalities to conduct a merge of two datasets under the Fellegi-Sunter model using the Expectation-Maximization algorithm. In addition, tools for preparing, adjusting, and summarizing data merges are included. The package implements methods described in Enamorado, Fifield, and Imai (2017) ”Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records”, available at <http://imai.princeton.edu/research/linkage.html>.
1777 Missing Data FHDI Fractional Hot Deck and Fully Efficient Fractional Imputation Impute general multivariate missing data with the fractional hot deck imputation based on Jaekwang Kim (2011) <doi:10.1093/biomet/asq073>.
1778 Missing Data filling Matrix Completion, Imputation, and Inpainting Methods Filling in the missing entries of a partially observed data is one of fundamental problems in various disciplines of mathematical science. For many cases, data at our interests have canonical form of matrix in that the problem is posed upon a matrix with missing values to fill in the entries under preset assumptions and models. We provide a collection of methods from multiple disciplines under Matrix Completion, Imputation, and Inpainting. See Davenport and Romberg (2016) <doi:10.1109/JSTSP.2016.2539100> for an overview of the topic.
1779 Missing Data forecast Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
1780 Missing Data ForImp Imputation of Missing Values Through a Forward Imputation Algorithm Imputation of missing values in datasets of ordinal variables through a forward imputation algorithm
1781 Missing Data gapfill Fill Missing Values in Satellite Data Tools to fill missing values in satellite data and to develop new gap-fill algorithms. The methods are tailored to data (images) observed at equally-spaced points in time. The package is illustrated with MODIS NDVI data.
1782 Missing Data GenForImp The Forward Imputation: A Sequential Distance-Based Approach for Imputing Missing Data Two methods based on the Forward Imputation approach are implemented for the imputation of quantitative missing data. One method alternates Nearest Neighbour Imputation and Principal Component Analysis (function ‘ForImp.PCA’), the other uses Nearest Neighbour Imputation with the Mahalanobis distance (function ‘ForImp.Mahala’).
1783 Missing Data GSE Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data Robust Estimation of Multivariate Location and Scatter in the Presence of Cellwise and Casewise Contamination and Missing Data.
1784 Missing Data gsynth Generalized Synthetic Control Method Provides causal inference with interactive fixed-effect models. It imputes counterfactuals for each treated unit using control group information based on a linear interactive fixed effects model that incorporates unit-specific intercepts interacted with time-varying coefficients. This method generalizes the synthetic control method to the case of multiple treated units and variable treatment periods, and improves efficiency and interpretability. This version supports unbalanced panels and implements the matrix completion method. Main reference: Yiqing Xu (2017) <doi:10.1017/pan.2016.2>.
1785 Missing Data Haplin Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes Performs genetic association analyses of case-parent triad (trio) data with multiple markers. It can also incorporate complete or incomplete control triads, for instance independent control children. Estimation is based on haplotypes, for instance SNP haplotypes, even though phase is not known from the genetic data. ‘Haplin’ estimates relative risk (RR + conf.int.) and p-value associated with each haplotype. It uses maximum likelihood estimation to make optimal use of data from triads with missing genotypic data, for instance if some SNPs has not been typed for some individuals. ‘Haplin’ also allows estimation of effects of maternal haplotypes and parent-of-origin effects, particularly appropriate in perinatal epidemiology. ‘Haplin’ allows special models, like X-inactivation, to be fitted on the X-chromosome. A GxE analysis allows testing interactions between environment and all estimated genetic effects. The models were originally described in Gjessing, HK and Lie, RT (2006) <doi:10.1111/j.1529-8817.2005.00218.x>.
1786 Missing Data HardyWeinberg Statistical Tests and Graphics for Hardy-Weinberg Equilibrium Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) <doi:10.1126/science.28.706.49> for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots.
1787 Missing Data hmi Hierarchical Multiple Imputation Runs single level and multilevel imputation models. The user just has to pass the data to the main function and, optionally, his analysis model. Basically the package then translates this analysis model into commands to impute the data according to it with functions from ‘mice’, ‘MCMCglmm’ or routines build for this package.
1788 Missing Data Hmisc Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
1789 Missing Data hot.deck (core) Multiple Hot-Deck Imputation Performs multiple hot-deck imputation of categorical and continuous variables in a data frame.
1790 Missing Data HotDeckImputation Hot Deck Imputation Methods for Missing Data Hot deck imputation methods to resolve missing data.
1791 Missing Data icdGLM EM by the Method of Weights for Incomplete Categorical Data in Generlized Linear Models Provides an estimator for generalized linear models with incomplete data for discrete covariates. The estimation is based on the EM algorithm by the method of weights by Ibrahim (1990) <doi:10.2307/2290013>.
1792 Missing Data icenReg Regression Models for Interval Censored Data Regression models for interval censored data. Currently supports Cox-PH, proportional odds, and accelerated failure time models. Allows for semi and fully parametric models (parametric only for accelerated failure time models) and Bayesian parametric models. Includes functions for easy visual diagnostics of model fits and imputation of censored data.
1793 Missing Data idealstan Generalized IRT Ideal Point Models with ‘Stan’ Offers item-response theory (IRT) ideal-point estimation for binary, ordinal, counts and continuous responses with time-varying and missing-data inference. Full and approximate Bayesian sampling with ‘Stan’ (<https://mc-stan.org/>).
1794 Missing Data idem Inference in Randomized Controlled Trials with Death and Missingness In randomized studies involving severely ill patients, functional outcomes are often unobserved due to missed clinic visits, premature withdrawal or death. It is well known that if these unobserved functional outcomes are not handled properly, biased treatment comparisons can be produced. In this package, we implement a procedure for comparing treatments that is based on the composite endpoint of both the functional outcome and survival. The procedure was proposed in Wang et al. (2016) <doi:10.1111/biom.12594>. It considers missing data imputation with a sensitivity analysis strategy to handle the unobserved functional outcomes not due to death.
1795 Missing Data imputePSF Impute Missing Data in Time Series Data with PSF Based Method Imputes the missing values in time series data with PSF algorithm based approach. The details about PSF algorithm are available at: <https://cran.r-project.org/package=PSF>.
1796 Missing Data imputeTestbench Test Bench for the Comparison of Imputation Methods Provides a test bench for the comparison of missing data imputation methods in uni-variate time series. Imputation methods are compared using different error metrics. Proposed imputation methods and alternative error metrics can be used.
1797 Missing Data imputeTS (core) Time Series Missing Value Imputation Imputation (replacement) of missing values in univariate time series. Offers several imputation functions and missing data plots. Available imputation algorithms include: ‘Mean’, ‘LOCF’, ‘Interpolation’, ‘Moving Average’, ‘Seasonal Decomposition’, ‘Kalman Smoothing on Structural Time Series models’, ‘Kalman Smoothing on ARIMA models’.
1798 Missing Data ipw Estimate Inverse Probability Weights Functions to estimate the probability to receive the observed treatment, based on individual characteristics. The inverse of these probabilities can be used as weights when estimating causal effects from observational data via marginal structural models. Both point treatment situations and longitudinal studies can be analysed. The same functions can be used to correct for informative censoring.
1799 Missing Data JointAI Joint Analysis and Imputation of Incomplete Data Provides joint analysis and imputation of (generalized) linear and cumulative logit regression models, (generalized) linear and cumulative logit mixed models and parametric (Weibull) as well as Cox proportional hazards survival models with incomplete (covariate) data in the Bayesian framework. The package performs some preprocessing of the data and creates a ‘JAGS’ model, which will then automatically be passed to ‘JAGS’ <http://mcmc-jags.sourceforge.net> with the help of the package ‘rjags’. It also provides summary and plotting functions for the output and allows to export imputed values.
1800 Missing Data jomo (core) Multilevel Joint Modelling Multiple Imputation Similarly to Schafer’s package ‘pan’, ‘jomo’ is a package for multilevel joint modelling multiple imputation (Carpenter and Kenward, 2013) <doi:10.1002/9781119942283>. Novel aspects of ‘jomo’ are the possibility of handling binary and categorical data through latent normal variables, the option to use cluster-specific covariance matrices and to impute compatibly with the substantive model.
1801 Missing Data lavaan Latent Variable Analysis Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
1802 Missing Data lori Imputation of Count Data using Side Information Analysis, imputation, and multiple imputation of count data using covariates. LORI uses a log-linear model where main row and column effects are decomposed as regression terms on known covariates. A residual low-rank interaction term is also fitted. LORI returns estimates of covariate effects and interactions, as well as an imputed count table. The package also contains a multiple imputation procedure.
1803 Missing Data ltm Latent Trait Models under IRT Analysis of multivariate dichotomous and polytomous data using latent trait models under the Item Response Theory approach. It includes the Rasch, the Two-Parameter Logistic, the Birnbaum’s Three-Parameter, the Graded Response, and the Generalized Partial Credit Models.
1804 Missing Data mdmb Model Based Treatment of Missing Data Contains model-based treatment of missing data for regression models with missing values in covariates or the dependent variable using maximum likelihood or Bayesian estimation (Ibrahim et al., 2005; <doi:10.1198/016214504000001844>). The regression model can be nonlinear (e.g., interaction effects, quadratic effects or B-spline functions). Multilevel models with missing data in predictors are available for Bayesian estimation. Substantive-model compatible multiple imputation can be also conducted.
1805 Missing Data memisc Management of Survey Data and Presentation of Analysis Results An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) ‘SPSS’ and ‘Stata’ files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to ‘LaTeX’ and HTML.
1806 Missing Data mi Missing Data Imputation and Model Checking The mi package provides functions for data manipulation, imputing missing values in an approximate Bayesian framework, diagnostics of the models used to generate the imputations, confidence-building mechanisms to validate some of the assumptions of the imputation algorithm, and functions to analyze multiply imputed data sets with the appropriate degree of sampling uncertainty.
1807 Missing Data mice (core) Multivariate Imputation by Chained Equations Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
1808 Missing Data miceadds Some Additional Multiple Imputation Functions, Especially for ‘mice’ Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are included. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>) and substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>).
1809 Missing Data miceFast Fast Imputations Using ‘Rcpp’ and ‘Armadillo’ Fast imputations under the object-oriented programming paradigm. There was used quantitative models with a closed-form solution. Thus package is based on linear algebra operations. The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used. A single evaluation of a quantitative model for the multiple imputations is another major enhancement. Moreover there are offered a few functions built to work with popular R packages such as ‘data.table’.
1810 Missing Data micemd Multiple Imputation by Chained Equations with Multilevel Data Addons for the ‘mice’ package to perform multiple imputation using chained equations with two-level data. Includes imputation methods dedicated to sporadically and systematically missing values. Imputation of continuous, binary or count variables are available. Following the recommendations of Audigier, V. et al (2018) <doi:10.1214/18-STS646>, the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation and overimputation for ‘mice’.
1811 Missing Data miceMNAR Missing not at Random Imputation Models for Multiple Imputation by Chained Equation Provides imputation models and functions for binary or continuous Missing Not At Random (MNAR) outcomes through the use of the ‘mice’ package. The mice.impute.hecknorm() function provides imputation model for continuous outcome based on Heckman’s model also named sample selection model as described in Galimard et al (2018) and Galimard et al (2016) <doi:10.1002/sim.6902>. The mice.impute.heckprob() function provides imputation model for binary outcome based on bivariate probit model as described in Galimard et al (2018).
1812 Missing Data mimi Main Effects and Interactions in Mixed and Incomplete Data Generalized low-rank models for mixed and incomplete data frames. The main function may be used for dimensionality reduction of imputation of numeric, binary and count data (simultaneously). Main effects such as column means, group effects, or effects of row-column side information (e.g. user/item attributes in recommendation system) may also be modelled in addition to the low-rank model. Genevieve Robin, Olga Klopp, Julie Josse, Eric Moulines, Robert Tibshirani (2018) <arXiv:1806.09734>.
1813 Missing Data mirt Multidimensional Item Response Theory Analysis of dichotomous and polytomous response data using unidimensional and multidimensional latent trait models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory models can be estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier analyses are available for modeling item testlets. Multiple group analysis and mixed effects designs also are available for detecting differential item and test functioning as well as modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, and several other discrete latent variable models, including mixture and zero-inflated response models, are supported.
1814 Missing Data missForest Nonparametric Missing Value Imputation using Random Forest The function ‘missForest’ in this package is used to impute missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation. It can be run in parallel to save computation time.
1815 Missing Data missMDA (core) Handling Missing Values with Multivariate Data Analysis Imputation of incomplete continuous or categorical datasets; Missing values are imputed with a principal component analysis (PCA), a multiple correspondence analysis (MCA) model or a multiple factor analysis (MFA) model; Perform multiple imputation with and in PCA or MCA.
1816 Missing Data MissMech Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random To test whether the missing data mechanism, in a set of incompletely observed data, is one of missing completely at random (MCAR). For detailed description see Jamshidian, M. Jalal, S., and Jansen, C. (2014). “MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR),” Journal of Statistical Software, 56(6), 1-31. URL http://www.jstatsoft.org/v56/i06/.
1817 Missing Data mitml Tools for Multiple Imputation in Multilevel Modeling Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages ‘pan’ and ‘jomo’, and several functions for visualization, data management and the analysis of multiply imputed data sets.
1818 Missing Data mitools Tools for Multiple Imputation of Missing Data Tools to perform analyses and combine results from multiple-imputation datasets.
1819 Missing Data mix Estimation/Multiple Imputation for Mixed Categorical and Continuous Data Estimation/multiple imputation programs for mixed categorical and continuous data.
1820 Missing Data MixedDataImpute Missing Data Imputation for Continuous and Categorical Data using Nonparametric Bayesian Joint Models Missing data imputation for continuous and categorical data, using nonparametric Bayesian joint models (specifically the hierarchically coupled mixture model with local dependence described in Murray and Reiter (2015); see ‘citation(“MixedDataImpute”)’ or http://arxiv.org/abs/1410.0438). See ‘?hcmm_impute’ for example usage.
1821 Missing Data naniar (core) Data Structures, Summaries, and Visualisations for Missing Data Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. ‘naniar’ provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of ‘ggplot2’ and tidy data.
1822 Missing Data nipals Principal Components Analysis using NIPALS with Gram-Schmidt Orthogonalization Principal Components Analysis of a matrix using Non-linear Iterative Partial Least Squares with Gram-Schmidt orthogonalization of the scores and loadings. Optimized for speed. See Andrecut (2009) <doi:10.1089/cmb.2008.0221>.
1823 Missing Data norm Analysis of multivariate normal datasets with missing values Analysis of multivariate normal datasets with missing values
1824 Missing Data NPBayesImputeCat Non-Parametric Bayesian Multiple Imputation for Categorical Data These routines create multiple imputations of missing at random categorical data, and create multiply imputed synthesis of categorical data, with or without structural zeros. Imputations and syntheses are based on Dirichlet process mixtures of multinomial distributions, which is a non-parametric Bayesian modeling approach that allows for flexible joint modeling.
1825 Missing Data OpenMx Extended Structural Equation Modelling Facilitates treatment of statistical model specifications as things that can be generated and manipulated programmatically. Structural equation models may be specified with reticular action model matrices or paths, linear structural relations matrices or paths, or directly in matrix algebra. Fit functions include full information maximum likelihood, maximum likelihood, and weighted least squares. Example models include confirmatory factor, multiple group, mixture distribution, categorical threshold, modern test theory, differential equations, state space, and many others. MacOS users can download the most up-to-date package binaries from <http://openmx.ssri.psu.edu>. See Neale, Hunter, Pritikin, Zahery, Brick, Kirkpatrick, Estabrook, Bates, Maes, & Boker (2016) <doi:10.1007/s11336-014-9435-8>.
1826 Missing Data padr Quickly Get Datetime Data Ready for Analysis Transforms datetime data into a format ready for analysis. It offers two core functionalities; aggregating data to a higher level interval (thicken) and imputing records where observations were absent (pad).
1827 Missing Data pan Multiple Imputation for Multivariate Panel or Clustered Data It provides functions and examples for maximum likelihood estimation for generalized linear mixed models and Gibbs sampler for multivariate linear mixed models with incomplete data, as described in Schafer JL (1997) “Imputation of missing covariates under a multivariate linear mixed model”. Technical report 97-04, Dept. of Statistics, The Pennsylvania State University.
1828 Missing Data phylin Spatial Interpolation of Genetic Data The spatial interpolation of genetic distances between samples is based on a modified kriging method that accepts a genetic distance matrix and generates a map of probability of lineage presence. This package also offers tools to generate a map of potential contact zones between groups with user-defined thresholds in the tree to account for old and recent divergence. Additionally, it has functions for IDW interpolation using genetic data and midpoints.
1829 Missing Data plsRglm Partial Least Squares Regression for Generalized Linear Models Provides (weighted) Partial least squares Regression for generalized linear models and repeated k-fold cross-validation of such models using various criteria. It allows for missing data in the explanatory variables. Bootstrap confidence intervals constructions are also available.
1830 Missing Data powerlmm Power Analysis for Longitudinal Multilevel Models Calculate power for the ‘time x treatment’ effect in two- and three-level multilevel longitudinal studies with missing data. Both the third-level factor (e.g. therapists, schools, or physicians), and the second-level factor (e.g. subjects), can be assigned random slopes. Studies with partially nested designs, unequal cluster sizes, unequal allocation to treatment arms, and different dropout patterns per treatment are supported. For all designs power can be calculated both analytically and via simulations. The analytical calculations extends the method described in Galbraith et al. (2002) <doi:10.1016/S0197-2456(02)00205-2>, to three-level models. Additionally, the simulation tools provides flexible ways to investigate bias, Type I errors and the consequences of model misspecification.
1831 Missing Data prefmod Utilities to Fit Paired Comparison Models for Preferences Generates design matrix for analysing real paired comparisons and derived paired comparison data (Likert type items/ratings or rankings) using a loglinear approach. Fits loglinear Bradley-Terry model (LLBT) exploiting an eliminate feature. Computes pattern models for paired comparisons, rankings, and ratings. Some treatment of missing values (MCAR and MNAR). Fits latent class (mixture) models for paired comparison, rating and ranking patterns using a non-parametric ML approach.
1832 Missing Data prophet Automatic Forecasting Procedure Implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
1833 Missing Data pseval Methods for Evaluating Principal Surrogates of Treatment Response Contains the core methods for the evaluation of principal surrogates in a single clinical trial. Provides a flexible interface for defining models for the risk given treatment and the surrogate, the models for integration over the missing counterfactual surrogate responses, and the estimation methods. Estimated maximum likelihood and pseudo-score can be used for estimation, and the bootstrap for inference. A variety of post-estimation summary methods are provided, including print, summary, plot, and testing.
1834 Missing Data PSIMEX SIMEX Algorithm on Pedigree Structures Generalization of the SIMEX algorithm from Cook & Stefanski (1994) <doi:10.2307/2290994> for the calculation of inbreeding depression or heritability on pedigree structures affected by missing or misassigned paternities. It simulates errors and tracks the behavior of the estimate as a function of the error proportion. It extrapolates back a true value corresponding to the null error rate.
1835 Missing Data PSM Non-Linear Mixed-Effects Modelling using Stochastic Differential Equations Functions for fitting linear and non-linear mixed-effects models using stochastic differential equations (SDEs). The package allows for any multivariate non-linear time-variant model to be specified, and it also handles multidimensional input, covariates, missing observations, and specification of dosage regimen. The provided pipeline relies on the coupling of the FOCE algorithm and Kalman filtering as outlined by Klim et al (2009, <doi:10.1016/j.cmpb.2009.02.001>) and has been validated against the proprietary software ‘NONMEM’ (Tornoe et al, 2005, <doi:10.1007/s11095-005-5269-5>). Further functions are provided for finding smoothed estimates of model states and for simulation.
1836 Missing Data PST Probabilistic Suffix Trees and Variable Length Markov Chains Provides a framework for analysing state sequences with probabilistic suffix trees (PST), the construction that stores variable length Markov chains (VLMC). Besides functions for learning and optimizing VLMC models, the PST library includes many additional tools to analyse sequence data with these models: visualization tools, functions for sequence prediction and artificial sequences generation, as well as for context and pattern mining. The package is specifically adapted to the field of social sciences by allowing to learn VLMC models from sets of individual sequences possibly containing missing values, and by accounting for case weights. The library also allows to compute probabilistic divergence between two models, and to fit segmented VLMC, where sub-models fitted to distinct strata of the learning sample are stored in a single PST. This software results from research work executed within the framework of the Swiss National Centre of Competence in Research LIVES, which is financed by the Swiss National Science Foundation. The authors are grateful to the Swiss National Science Foundation for its financial support.
1837 Missing Data QTLRel Tools for Mapping of Quantitative Traits of Genetically Related Individuals and Calculating Identity Coefficients from Pedigrees This software provides tools for quantitative trait mapping in populations such as advanced intercross lines where relatedness among individuals should not be ignored. It can estimate background genetic variance components, impute missing genotypes, simulate genotypes, perform a genome scan for putative quantitative trait loci (QTL), and plot mapping results. It also has functions to calculate identity coefficients from pedigrees, especially suitable for pedigrees that consist of a large number of generations, or estimate identity coefficients from genotypic data in certain circumstances.
1838 Missing Data Qtools Utilities for Quantiles Functions for unconditional and conditional quantiles. These include methods for transformation-based quantile regression, quantile-based measures of location, scale and shape, methods for quantiles of discrete variables, quantile-based multiple imputation, and restricted quantile regression. A vignette is given in Geraci (2016, The R Journal) and included in the package documents.
1839 Missing Data randomForest Breiman and Cutler’s Random Forests for Classification and Regression Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <doi:10.1023/A:1010933404324>.
1840 Missing Data reddPrec Reconstruction of Daily Data - Precipitation Computes quality control for daily precipitation datasets, reconstructs the original series by estimating precipitation in missing values, creates new series in a specified pair of coordinates and creates grids.
1841 Missing Data Rmagic MAGIC - Markov Affinity-Based Graph Imputation of Cells MAGIC (Markov affinity-based graph imputation of cells) is a method for addressing technical noise in single-cell data, including under-sampling of mRNA molecules, often termed “dropout” which can severely obscure important gene-gene relationships. MAGIC shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. Read more: van Dijk et al. (2018) <doi:10.1016/j.cell.2018.05.061>.
1842 Missing Data RNAseqNet Log-Linear Poisson Graphical Model with Hot-Deck Multiple Imputation Infer log-linear Poisson Graphical Model with an auxiliary data set. Hot-deck multiple imputation method is used to improve the reliability of the inference with an auxiliary dataset. Standard log-linear Poisson graphical model can also be used for the inference and the Stability Approach for Regularization Selection (StARS) is implemented to drive the selection of the regularization parameter. The method is fully described in <doi:10.1093/bioinformatics/btx819>.
1843 Missing Data robCompositions Compositional Data Analysis Methods for analysis of compositional data including robust methods, imputation, methods to replace rounded zeros, (robust) outlier detection for compositional data, (robust) principal component analysis for compositional data, (robust) factor analysis for compositional data, (robust) discriminant analysis for compositional data (Fisher rule), robust regression with compositional predictors and (robust) Anderson-Darling normality tests for compositional data as well as popular log-ratio transformations (addLR, cenLR, isomLR, and their inverse transformations). In addition, visualisation and diagnostic tools are implemented as well as high and low-level plot functions for the ternary diagram.
1844 Missing Data robustrao An Extended Rao-Stirling Diversity Index to Handle Missing Data A collection of functions to compute the Rao-Stirling diversity index (Porter and Rafols, 2009) <doi:10.1007/s11192-008-2197-2> and its extension to acknowledge missing data (i.e., uncategorized references) by calculating its interval of uncertainty using mathematical optimization as proposed in Calatrava et al. (2016) <doi:10.1007/s11192-016-1842-4>. The Rao-Stirling diversity index is a well-established bibliometric indicator to measure the interdisciplinarity of scientific publications. Apart from the obligatory dataset of publications with their respective references and a taxonomy of disciplines that categorizes references as well as a measure of similarity between the disciplines, the Rao-Stirling diversity index requires a complete categorization of all references of a publication into disciplines. Thus, it fails for a incomplete categorization; in this case, the robust extension has to be used, which encodes the uncertainty caused by missing bibliographic data as an uncertainty interval. Classification / ACM - 2012: Information systems ~ Similarity measures, Theory of computation ~ Quadratic programming, Applied computing ~ Digital libraries and archives.
1845 Missing Data ROptSpace Matrix Reconstruction from a Few Entries Matrix reconstruction, also known as matrix completion, is the task of inferring missing entries of a partially observed matrix. This package provides a method called OptSpace, which was proposed by Keshavan, R.H., Oh, S., and Montanari, A. (2009) <doi:10.1109/ISIT.2009.5205567> for a case under low-rank assumption.
1846 Missing Data Rphylopars Phylogenetic Comparative Tools for Missing Data and Within-Species Variation Tools for performing phylogenetic comparative methods for datasets with with multiple observations per species (intraspecific variation or measurement error) and/or missing data. Performs ancestral state reconstruction and missing data imputation on the estimated evolutionary model, which can be specified as Brownian Motion, Ornstein-Uhlenbeck, Early-Burst, Pagel’s lambda, kappa, or delta, or a star phylogeny.
1847 Missing Data rsem Robust Structural Equation Modeling with Missing Data and Auxiliary Variables A robust procedure is implemented to estimate means and covariance matrix of multiple variables with missing data using Huber weight and then to estimate a structural equation model.
1848 Missing Data rtop Interpolation of Data with Variable Spatial Support Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.
1849 Missing Data samon Sensitivity Analysis for Missing Data In a clinical trial with repeated measures designs, outcomes are often taken from subjects at fixed time-points. The focus of the trial may be to compare the mean outcome in two or more groups at some pre-specified time after enrollment. In the presence of missing data auxiliary assumptions are necessary to perform such comparisons. One commonly employed assumption is the missing at random assumption (MAR). The ‘samon’ package allows the user to perform a (parameterized) sensitivity analysis of this assumption. In particular it can be used to examine the sensitivity of tests in the difference in outcomes to violations of the MAR assumption. The sensitivity analysis can be performed under two scenarios, a) where the data exhibit a monotone missing data pattern (see the samon() function), and, b) where in addition to a monotone missing data pattern the data exhibit intermittent missing values (see the samonIM() function).
1850 Missing Data sbart Sequential BART for Imputation of Missing Covariates Implements the sequential BART (Bayesian Additive Regression Trees) approach to impute the missing covariates. The algorithm applies a Bayesian nonparametric approach on factored sets of sequential conditionals of the joint distribution of the covariates and the missingness and applying the Bayesian additive regression trees to model each of these univariate conditionals. Each conditional distribution is then sampled using MCMC algorithm. The published journal can be found at <https://doi.org/10.1093/biostatistics/kxw009> Package provides a function, seqBART(), which computes and returns the imputed values.
1851 Missing Data sbgcop Semiparametric Bayesian Gaussian Copula Estimation and Imputation Estimation and inference for parameters in a Gaussian copula model, treating the univariate marginal distributions as nuisance parameters as described in Hoff (2007) <doi:10.1214/07-AOAS107>. This package also provides a semiparametric imputation procedure for missing multivariate data.
1852 Missing Data scorecardModelUtils Credit Scorecard Modelling Utils Provides infrastructure functionalities such as missing value treatment, information value calculation, GINI calculation etc. which are used for developing a traditional credit scorecard as well as a machine learning based model. The functionalities defined are standard steps for any credit underwriting scorecard development, extensively used in financial domain.
1853 Missing Data simputation Simple Imputation Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the ‘magrittr’ package.
1854 Missing Data sjlabelled Labelled Data Utility Functions Collection of functions dealing with labelled data, like reading and writing data between R and other statistical software packages like ‘SPSS’, ‘SAS’ or ‘Stata’, and working with labelled data. This includes easy ways to get, set or change value and variable label attributes, to convert labelled vectors into factors or numeric (and vice versa), or to deal with multiple declared missing values.
1855 Missing Data sjmisc Data and Variable Transformation Functions Collection of miscellaneous utility functions, supporting data transformation tasks like recoding, dichotomizing or grouping variables, setting and replacing missing values. The data transformation functions also support labelled data, and all integrate seamlessly into a ‘tidyverse’-workflow.
1856 Missing Data smcfcs Multiple Imputation of Covariates by Substantive Model Compatible Fully Conditional Specification Implements multiple imputation of missing covariates by Substantive Model Compatible Fully Conditional Specification. This is a modification of the popular FCS/chained equations multiple imputation approach, and allows imputation of missing covariate values from models which are compatible with the user specified substantive model.
1857 Missing Data SNPassoc SNPs-based whole genome association studies This package carries out most common analysis when performing whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated.
1858 Missing Data softImpute (core) Matrix Completion via Iterative Soft-Thresholded SVD Iterative methods for matrix completion that use nuclear-norm regularization. There are two main approaches.The one approach uses iterative soft-thresholded svds to impute the missing values. The second approach uses alternating least squares. Both have an “EM” flavor, in that at each iteration the matrix is completed with the current estimate. For large matrices there is a special sparse-matrix class named “Incomplete” that efficiently handles all computations. The package includes procedures for centering and scaling rows, columns or both, and for computing low-rank SVDs on large sparse centered matrices (i.e. principal components)
1859 Missing Data spacetime Classes and Methods for Spatio-Temporal Data Classes and methods for spatio-temporal data, including space-time regular lattices, sparse lattices, irregular data, and trajectories; utility functions for plotting data as map sequences (lattice or animation) or multiple time series; methods for spatial and temporal selection and subsetting, as well as for spatial/temporal/spatio-temporal matching or aggregation, retrieving coordinates, print, summary, etc.
1860 Missing Data sptemExp Constrained Spatiotemporal Mixed Models for Exposure Estimation The approach of constrained spatiotemporal mixed models is to make reliable estimation of air pollutant concentrations at high spatiotemporal resolution (Li, L., Zhang, J., Meng, X., Fang, Y., Ge, Y., Wang, J., Wang, C., Wu, J., Kan, H. (2018) <doi.org/10.1016/j.rse.2018.09.001>; Li, L., Lurmann, F., Habre, R., Urman, R., Rappaport, E., Ritz, B., Chen, J., Gilliland, F., Wu, J., (2017) <doi:10.1021/acs.est.7b01864>). This package is an extensive tool for this modeling approach with support of block Kriging (Goovaerts, P. (1997) <http://www.gbv.de/dms/goettingen/229148123.pdf>) and uses the PM2.5 modeling as examples. It provides the following functionality: (1) Extraction of covariates from the satellite images such as GeoTiff and NC4 raster; (2) Generation of temporal basis functions to simulate the seasonal trends in the study regions; (3) Generation of the regional monthly or yearly means of air pollutant concentration; (4) Generation of Thiessen polygons and spatial effect modeling; (5) Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing; (6) Integrated predictions with or without weights of the model’s performance, supporting multi-core parallel computing; (7) Constrained optimization to interpolate the missing values; (8) Generation of the grid surfaces of air pollutant concentration estimates at high resolution; (9) Block Kriging for regional mean estimation at multiple scales.
1861 Missing Data StAMPP Statistical Analysis of Mixed Ploidy Populations Allows users to calculate pairwise Nei’s Genetic Distances (Nei 1972), pairwise Fixation Indexes (Fst) (Weir & Cockerham 1984) and also Genomic Relationship matrixes following Yang et al. (2010) in mixed and single ploidy populations. Bootstrapping across loci is implemented during Fst calculation to generate confidence intervals and p-values around pairwise Fst values. StAMPP utilises SNP genotype data of any ploidy level (with the ability to handle missing data) and is coded to utilise multithreading where available to allow efficient analysis of large datasets. StAMPP is able to handle genotype data from genlight objects allowing integration with other packages such adegenet. Please refer to LW Pembleton, NOI Cogan & JW Forster, 2013, Molecular Ecology Resources, 13(5), 946-952. <doi:10.1111/1755-0998.12129> for the appropriate citation and user manual. Thank you in advance.
1862 Missing Data StatMatch Statistical Matching or Data Fusion Integration of two data sources referred to the same target population which share a number of variables. Some functions can also be used to impute missing values in data sets through hot deck imputation methods. Methods to perform statistical matching when dealing with data from complex sample surveys are available too.
1863 Missing Data stlplus Enhanced Seasonal Decomposition of Time Series by Loess Decompose a time series into seasonal, trend, and remainder components using an implementation of Seasonal Decomposition of Time Series by Loess (STL) that provides several enhancements over the STL method in the stats package. These enhancements include handling missing values, providing higher order (quadratic) loess smoothing with automated parameter choices, frequency component smoothing beyond the seasonal and trend components, and some basic plot methods for diagnostics.
1864 Missing Data StratifiedRF Builds Trees by Sampling Variables in Groups Random Forest-like tree ensemble that works with groups of predictor variables. When building a tree, a number of variables is taken randomly from each group separately, thus ensuring that it considers variables from each group for the splits. Useful when rows contain information about different things (e.g. user information and product information) and it’s not sensible to make a prediction with information from only one group of variables, or when there are far more variables from one group than the other and it’s desired to have groups appear evenly on trees. Trees are grown using the C5.0 algorithm rather than the usual CART algorithm. Supports parallelization (multithreaded), missing values in predictors, and categorical variables (without doing One-Hot encoding in the processing). Can also be used to create a regular (non-stratified) Random Forest-like model, but made up of C5.0 trees and with some additional control options. As it’s built with C5.0 trees, it works only for classification (not for regression).
1865 Missing Data swgee Simulation Extrapolation Inverse Probability Weighted Generalized Estimating Equations Simulation extrapolation and inverse probability weighted generalized estimating equations method for longitudinal data with missing observations and measurement error in covariates. References: Yi, G. Y. (2008) <doi:10.1093/biostatistics/kxm054>; Cook, J. R. and Stefanski, L. A. (1994) <doi:10.1080/01621459.1994.10476871>; Little, R. J. A. and Rubin, D. B. (2002, ISBN:978-0-471-18386-0).
1866 Missing Data TAM Test Analysis Modules Includes marginal maximum likelihood estimation and joint maximum likelihood estimation for unidimensional and multidimensional item response models. The package functionality covers the Rasch model, 2PL model, 3PL model, generalized partial credit model, multi-faceted Rasch model, nominal item response model, structured latent class model, mixture distribution IRT models, and located latent class models. Latent regression models and plausible value imputation are also supported. For details see Adams, Wilson and Wang, 1997 <doi:10.1177/0146621697211001>, Adams, Wilson and Wu, 1997 <doi:10.3102/10769986022001047>, Formann, 1982 <doi:10.1002/bimj.4710240209>, Formann, 1992 <doi:10.1080/01621459.1992.10475229>.
1867 Missing Data TAR Bayesian Modeling of Autoregressive Threshold Time Series Models Identification and estimation of the autoregressive threshold models with Gaussian noise, as well as positive-valued time series. The package provides the identification of the number of regimes, the thresholds and the autoregressive orders, as well as the estimation of remain parameters. The package implements the methodology from the 2005 paper: Modeling Bivariate Threshold Autoregressive Processes in the Presence of Missing Data <doi:10.1081/STA-200054435>.
1868 Missing Data TestDataImputation Missing Item Responses Imputation for Test and Assessment Data Functions for imputing missing item responses for dichotomous and polytomous test and assessment data. This package enables missing imputation methods that are suitable for test and assessment data, including: listwise (LW) deletion, treating as incorrect (IN), person mean imputation (PM), item mean imputation (IM), two-way imputation (TW), logistic regression imputation (LR), and EM imputation.
1869 Missing Data tidyimpute Imputation the Tidyverse Way Functions and methods for imputing missing values (NA) in tables and list patterned after the tidyverse approach of ‘dplyr’ and ‘rlang’; works with data.tables as well.
1870 Missing Data timeSeries Rmetrics - Financial Time Series Objects Provides a class and various tools for financial time series. This includes basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.
1871 Missing Data TippingPoint Enhanced Tipping Point Displays the Results of Sensitivity Analyses for Missing Data Using the idea of “tipping point” (proposed in Gregory Campbell, Gene Pennello and Lilly Yue(2011) <doi:10.1080/10543406.2011.550094>) to visualize the results of sensitivity analysis for missing data, the package provides a set of functions to list out all the possible combinations of the values of missing data in two treatment arms, calculate corresponding estimated treatment effects and p values and draw a colored heat-map to visualize them. It could deal with randomized experiments with a binary outcome or a continuous outcome. In addition, the package provides a visualized method to compare various imputation methods by adding the rectangles or convex hulls on the basic plot.
1872 Missing Data TreePar Estimating birth and death rates based on phylogenies
  1. For a given species phylogeny on present day data which is calibrated to calendar-time, a method for estimating maximum likelihood speciation and extinction processes is provided. The method allows for non-constant rates. Rates may change (1) as a function of time, i.e. rate shifts at specified times or mass extinction events (likelihood implemented as LikShifts, optimization as bd.shifts.optim and visualized as bd.shifts.plot) or (2) as a function of the number of species, i.e. density-dependence (likelihood implemented as LikDD and optimization as bd.densdep.optim) or (3) extinction rate may be a function of species age (likelihood implemented as LikAge and optimization as bd.age.optim.matlab). Note that the methods take into account the whole phylogeny, in particular it accounts for the “pull of the present” effect. (1-3) can take into account incomplete species sampling, as long as each species has the same probability of being sampled. For a given phylogeny on higher taxa (i.e. all but one species per taxa are missing), where the number of species is known within each higher taxa, speciation and extinction rates can be estimated under model (1) (implemented within LikShifts and bd.shifts.optim with groups !=0). (ii) For a given phylogeny with sequentially sampled tips, e.g. a virus phylogeny, rates can be estimated under a model where rates vary across time using bdsky.stt.optim based on likelihood LikShiftsSTT (extending LikShifts and bd.shifts.optim). Furthermore, rates may vary as a function of host types using LikTypesSTT (multitype branching process extending functions in R package diversitree). This function can furthermore calculate the likelihood under an epidemiological model where infected individuals are first exposed and then infectious.
1873 Missing Data TreeSim Simulating Phylogenetic Trees Simulation methods for phylogenetic trees where (i) all tips are sampled at one time point or (ii) tips are sampled sequentially through time. (i) For sampling at one time point, simulations are performed under a constant rate birth-death process, conditioned on having a fixed number of final tips (sim.bd.taxa()), or a fixed age (sim.bd.age()), or a fixed age and number of tips (sim.bd.taxa.age()). When conditioning on the number of final tips, the method allows for shifts in rates and mass extinction events during the birth-death process (sim.rateshift.taxa()). The function sim.bd.age() (and sim.rateshift.taxa() without extinction) allow the speciation rate to change in a density-dependent way. The LTT plots of the simulations can be displayed using LTT.plot(), LTT.plot.gen() and LTT.average.root(). TreeSim further samples trees with n final tips from a set of trees generated by the common sampling algorithm stopping when a fixed number m>>n of tips is first reached (sim.gsa.taxa()). This latter method is appropriate for m-tip trees generated under a big class of models (details in the sim.gsa.taxa() man page). For incomplete phylogeny, the missing speciation events can be added through simulations (corsim()). (ii) sim.rateshifts.taxa() is generalized to sim.bdsky.stt() for serially sampled trees, where the trees are conditioned on either the number of sampled tips or the age. Furthermore, for a multitype-branching process with sequential sampling, trees on a fixed number of tips can be simulated using sim.bdtypes.stt.taxa(). This function further allows to simulate under epidemiological models with an exposed class. The function sim.genespeciestree() simulates coalescent gene trees within birth-death species trees, and sim.genetree() simulates coalescent gene trees.
1874 Missing Data tsibble Tidy Temporal Data Frames and Tools Provides a ‘tbl_ts’ class (the ‘tsibble’) to store and manage temporal data in a data-centric format, which is built on top of the ‘tibble’. The ‘tsibble’ aims at easily manipulating and analysing temporal data, including counting and filling in time gaps, aggregate over calendar periods, performing rolling window calculations, and etc.
1875 Missing Data TVsMiss Variable Selection for Missing Data Use a regularization likelihood method to achieve variable selection purpose. Likelihood can be worked with penalty lasso, smoothly clipped absolute deviations (SCAD), and minimax concave penalty (MCP). Tuning parameter selection techniques include cross validation (CV), Bayesian information criterion (BIC) (low and high), stability of variable selection (sVS), stability of BIC (sBIC), and stability of estimation (sEST). More details see Jiwei Zhao, Yang Yang, and Yang Ning (2018) <arXiv:1703.06379> “Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data.” Statistica Sinica.
1876 Missing Data VarSelLCM Variable Selection for Model-Based Clustering of Mixed-Type Data Set with Missing Values Full model selection (detection of the relevant features and estimation of the number of clusters) for model-based clustering (see reference here <doi:10.1007/s11222-016-9670-1>). Data to analyze can be continuous, categorical, integer or mixed. Moreover, missing values can occur and do not necessitate any pre-processing. Shiny application permits an easy interpretation of the results.
1877 Missing Data VIM (core) Visualization and Imputation of Missing Values New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
1878 Missing Data VIMGUI Visualization and Imputation of Missing Values - Graphical User Interface A graphical user interface for the methods implemented in the package VIM. It allows an easy handling of the implemented plot and imputation methods.
1879 Missing Data WaverR Data Estimation using Weighted Averages of Multiple Regressions For multivariate datasets, this function enables the estimation of missing data using the Weighted AVERage of all possible Regressions using the data available.
1880 Missing Data wNNSel Weighted Nearest Neighbor Imputation of Missing Values using Selected Variables New tools for the imputation of missing values in high-dimensional data are introduced using the non-parametric nearest neighbor methods. It includes weighted nearest neighbor imputation methods that use specific distances for selected variables. It includes an automatic procedure of cross validation and does not require prespecified values of the tuning parameters. It can be used to impute missing values in high-dimensional data when the sample size is smaller than the number of predictors. For more information see Faisal and Tutz (2017) <doi:10.1515/sagmb-2015-0098>.
1881 Missing Data wrangle A Systematic Data Wrangling Idiom Supports systematic scrutiny, modification, and integration of data. The function status() counts rows that have missing values in grouping columns (returned by na() ), have non-unique combinations of grouping columns (returned by dup() ), and that are not locally sorted (returned by unsorted() ). Functions enumerate() and itemize() give sorted unique combinations of columns, with or without occurrence counts, respectively. Function ignore() drops columns in x that are present in y, and informative() drops columns in x that are entirely NA; constant() returns values that are constant, given a key. Data that have defined unique combinations of grouping values behave more predictably during merge operations.
1882 Missing Data xts eXtensible Time Series Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
1883 Missing Data yaImpute (core) Nearest Neighbor Observation Imputation and Evaluation Tools Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.
1884 Missing Data zCompositions Treatment of Zeros, Left-Censored and Missing Values in Compositional Data Sets Principled methods for the imputation of zeros, left-censored and missing data in compositional data sets.
1885 Missing Data zoo S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
1886 Model Deployment with R arules Mining Association Rules and Frequent Itemsets Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat.
1887 Model Deployment with R arulesCBA Classification Based on Association Rules Provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.
1888 Model Deployment with R arulesSequences Mining Frequent Sequences Add-on for arules to handle and mine frequent sequences. Provides interfaces to the C++ implementation of cSPADE by Mohammed J. Zaki.
1889 Model Deployment with R aurelius Generates PFA Documents from R Code and Optionally Runs Them Provides tools for converting R objects and syntax into the Portable Format for Analytics (PFA). Allows for testing validity and runtime behavior of PFA documents through rPython and Titus, a more complete implementation of PFA for Python. The Portable Format for Analytics is a specification for event-based processors that perform predictive or analytic calculations and is aimed at helping smooth the transition from statistical model development to large-scale and/or online production. See <http://dmg.org/pfa> for more information.
1890 Model Deployment with R AzureML Interface with Azure Machine Learning Datasets, Experiments and Web Services Functions and datasets to support Azure Machine Learning. This allows you to interact with datasets, as well as publish and consume R functions as API services.
1891 Model Deployment with R dbplyr A ‘dplyr’ Back End for Databases A ‘dplyr’ back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a ‘DBI’ back end; more advanced features require ‘SQL’ translation to be provided by the package author.
1892 Model Deployment with R domino R Console Bindings for the ‘Domino Command-Line Client’ A wrapper on top of the ‘Domino Command-Line Client’. It lets you run ‘Domino’ commands (e.g., “run”, “upload”, “download”) directly from your R environment. Under the hood, it uses R’s system function to run the ‘Domino’ executable, which must be installed as a prerequisite. ‘Domino’ is a service that makes it easy to run your code on scalable hardware, with integrated version control and collaboration features designed for analytical workflows (see <http://www.dominodatalab.com> for more information).
1893 Model Deployment with R dplyr A Grammar of Data Manipulation A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
1894 Model Deployment with R FastRWeb Fast Interactive Framework for Web Scripting Using R Infrastrcture for creating rich, dynamic web content using R scripts while maintaining very fast response time.
1895 Model Deployment with R h2o R Interface for ‘H2O’ R interface for ‘H2O’, the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).
1896 Model Deployment with R httpuv HTTP and WebSocket Server Library Provides low-level socket and protocol support for handling HTTP and WebSocket requests directly from within R. It is primarily intended as a building block for other packages, rather than making it particularly easy to create complete web applications using httpuv alone. httpuv is built on top of the libuv and http-parser C libraries, both of which were developed by Joyent, Inc. (See LICENSE file for libuv and http-parser license information.)
1897 Model Deployment with R ibmdbR IBM in-Database Analytics for R Functionality required to efficiently use R with IBM(R) Db2(R) Warehouse offerings (formerly IBM dashDB(R)) and IBM Db2 for z/OS(R) in conjunction with IBM Db2 Analytics Accelerator for z/OS. Many basic and complex R operations are pushed down into the database, which removes the main memory boundary of R and allows to make full use of parallel processing in the underlying database.
1898 Model Deployment with R jug A Simple Web Framework for R jug is a web framework aimed at easily building APIs. It is mostly aimed at exposing R functions, models and visualizations to third-parties by way of http requests.
1899 Model Deployment with R keras R Interface to ‘Keras’ Interface to ‘Keras’ <https://keras.io>, a high-level neural networks ‘API’. ‘Keras’ was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both ‘CPU’ and ‘GPU’ devices.
1900 Model Deployment with R mleap Interface to ‘MLeap’ A ‘sparklyr’ <https://spark.rstudio.com> extension that provides an interface to ‘MLeap’ <https://github.com/combust/mleap>, an open source library that enables exporting and serving of ‘Apache Spark’ pipelines.
1901 Model Deployment with R onnx R Interface to ‘ONNX’ R Interface to ‘ONNX’ - Open Neural Network Exchange <https://onnx.ai/>. ‘ONNX’ provides an open source format for machine learning models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.
1902 Model Deployment with R opencpu Producing and Reproducing Results A system for embedded scientific computing and reproducible research with R. The OpenCPU server exposes a simple but powerful HTTP api for RPC and data interchange with R. This provides a reliable and scalable foundation for statistical services or building R web applications. The OpenCPU server runs either as a single-user development server within the interactive R session, or as a multi-user Linux stack based on Apache2. The entire system is fully open source and permissively licensed. The OpenCPU website has detailed documentation and example apps.
1903 Model Deployment with R plumber An API Generator for R Gives the ability to automatically generate and serve an HTTP API from R functions using the annotations in the R documentation around your functions.
1904 Model Deployment with R pmml Generate PMML for Various Models The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define machine learning, statistical and data mining models and to share models between PMML compliant applications. More information about the PMML industry standard and the Data Mining Group can be found at <http://www.dmg.org>. The generated PMML can be imported into any PMML consuming application, such as Zementis Predictive Analytics products, which integrate with web services, relational database systems and deploy natively on Hadoop in conjunction with Hive, Spark or Storm, as well as allow predictive analytics to be executed for IBM z Systems mainframe applications and real-time, streaming analytics platforms. The package isofor (used for anomaly detection) can be installed with devtools::install_github(“Zelazny7/isofor”).
1905 Model Deployment with R pmmlTransformations Transforms Input Data from a PMML Perspective Allows for data to be transformed before using it to construct models. Builds structures to allow functions in the PMML package to output transformation details in addition to the model in the resulting PMML file. The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define machine learning, statistical and data mining models and to share models between PMML compliant applications. More information about the PMML industry standard and the Data Mining Group can be found at <http://www.dmg.org>. The generated PMML can be imported into any PMML consuming application, such as Zementis Predictive Analytics products, which integrate with web services, relational database systems and deploy natively on Hadoop in conjunction with Hive, Spark or Storm, as well as allow predictive analytics to be executed for IBM z Systems mainframe applications and real-time, streaming analytics platforms.
1906 Model Deployment with R rattle Graphical User Interface for Data Science in R The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself.
1907 Model Deployment with R reticulate Interface to ‘Python’ Interface to ‘Python’ modules, classes, and functions. When calling into ‘Python’, R data types are automatically converted to their equivalent ‘Python’ types. When values are returned from ‘Python’ to R they are converted back to R types. Compatible with all versions of ‘Python’ >= 2.7.
1908 Model Deployment with R RSclient Client for Rserve Client for Rserve, allowing to connect to Rserve instances and issue commands.
1909 Model Deployment with R Rserve Binary R server Rserve acts as a socket server (TCP/IP or local sockets) which allows binary requests to be sent to R. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++ and Java, allowing any application to use facilities of R without the need of linking to R code. Rserve supports remote connection, user authentication and file transfer. A simple R client is included in this package as well.
1910 Model Deployment with R rsparkling R Interface for H2O Sparkling Water An extension package for ‘sparklyr’ that provides an R interface to H2O Sparkling Water machine learning library (see <https://github.com/h2oai/sparkling-water> for more information).
1911 Model Deployment with R sparklyr R Interface to Apache Spark R interface to Apache Spark, a fast and general engine for big data processing, see <http://spark.apache.org>. This package supports connecting to local and remote Apache Spark clusters, provides a ‘dplyr’ compatible back-end, and provides an interface to Spark’s built-in machine learning algorithms.
1912 Model Deployment with R tensorflow R Interface to ‘TensorFlow’ Interface to ‘TensorFlow’ <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more ‘CPUs’ or ‘GPUs’ in a desktop, server, or mobile device with a single ‘API’. ‘TensorFlow’ was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
1913 Model Deployment with R tfestimators Interface to ‘TensorFlow’ Estimators Interface to ‘TensorFlow’ Estimators <https://www.tensorflow.org/programmers_guide/estimators>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
1914 Model Deployment with R tidypredict Run Predictions Inside the Database It parses a fitted ‘R’ model object, and returns a formula in ‘Tidy Eval’ code that calculates the predictions. It works with several databases back-ends because it leverages ‘dplyr’ and ‘dbplyr’ for the final ‘SQL’ translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger() and earth() models.
1915 Model Deployment with R xgboost Extreme Gradient Boosting Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
1916 Model Deployment with R yhatr R Binder for the Yhat API Deploy, maintain, and invoke models via the Yhat REST API.
1917 Multivariate Statistics abind Combine Multidimensional Arrays Combine multidimensional arrays into a single array. This is a generalization of ‘cbind’ and ‘rbind’. Works with vectors, matrices, and higher-dimensional arrays. Also provides functions ‘adrop’, ‘asub’, and ‘afill’ for manipulating, extracting and replacing data in arrays.
1918 Multivariate Statistics ade4 (core) Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
1919 Multivariate Statistics amap Another Multidimensional Analysis Package Tools for Clustering and Principal Component Analysis (With robust methods, and parallelized functions).
1920 Multivariate Statistics aplpack Another Plot Package: ‘Bagplots’, ‘Iconplots’, ‘Summaryplots’, Slider Functions and Others Some functions for drawing some special plots: The function ‘bagplot’ plots a bagplot, ‘faces’ plots chernoff faces, ‘iconplot’ plots a representation of a frequency table or a data matrix, ‘plothulls’ plots hulls of a bivariate data set, ‘plotsummary’ plots a graphical summary of a data set, ‘puticon’ adds icons to a plot, ‘skyline.hist’ combines several histograms of a one dimensional data set in one plot, ‘slider’ functions supports some interactive graphics, ‘spin3R’ helps an inspection of a 3-dim point cloud, ‘stem.leaf’ plots a stem and leaf plot, ‘stem.leaf.backback’ plots back-to-back versions of stem and leaf plot.
1921 Multivariate Statistics ash David Scott’s ASH Routines David Scott’s ASH routines ported from S-PLUS to R.
1922 Multivariate Statistics bayesm Bayesian Inference for Marketing/Micro-Econometrics Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley 2005) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).
1923 Multivariate Statistics ca Simple, Multiple and Joint Correspondence Analysis Computation and visualization of simple, multiple and joint correspondence analysis.
1924 Multivariate Statistics calibrate Calibration of Scatterplot and Biplot Axes Package for drawing calibrated scales with tick marks on (non-orthogonal) variable vectors in scatterplots and biplots.
1925 Multivariate Statistics car Companion to Applied Regression Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.
1926 Multivariate Statistics caret Classification and Regression Training Misc functions for training and plotting classification and regression models.
1927 Multivariate Statistics class Functions for Classification Various functions for classification, including k-nearest neighbour, Learning Vector Quantization and Self-Organizing Maps.
1928 Multivariate Statistics clue Cluster Ensembles CLUster Ensembles.
1929 Multivariate Statistics cluster (core) “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.
1930 Multivariate Statistics clusterGeneration Random Cluster Generation (with Specified Degree of Separation) We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1-D and 2-D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.
1931 Multivariate Statistics clusterSim Searching for Optimal Clustering Procedure for a Data Set Distance measures (GDM1, GDM2, Sokal-Michener, Bray-Curtis, for symbolic interval-valued data), cluster quality indices (Calinski-Harabasz, Baker-Hubert, Hubert-Levine, Silhouette, Krzanowski-Lai, Hartigan, Gap, Davies-Bouldin), data normalization formulas (metric data, interval-valued symbolic data), data generation (typical and non-typical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic interval-valued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/978-3-642-57280-7_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/978-3-642-55721-7_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/1467-9868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/978-3-540-78246-9_11>).
1932 Multivariate Statistics clustvarsel Variable Selection for Gaussian Model-Based Clustering Variable selection for Gaussian model-based clustering as implemented in the ‘mclust’ package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forward-backward or backward-forward direction, with or without sub-sampling at the hierarchical clustering stage for starting ‘mclust’ models. By default the algorithm uses a sequential search, but parallelisation is also available.
1933 Multivariate Statistics clv Cluster Validation Techniques Package contains most of the popular internal and external cluster validation methods ready to use for the most of the outputs produced by functions coming from package “cluster”. Package contains also functions and examples of usage for cluster stability approach that might be applied to algorithms implemented in “cluster” package as well as user defined clustering algorithms.
1934 Multivariate Statistics cocorresp Co-Correspondence Analysis Methods Fits predictive and symmetric co-correspondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.
1935 Multivariate Statistics concor Concordance The four functions svdcp (cp for column partitioned), svdbip or svdbip2 (bip for bi-partitioned), and svdbips (s for a simultaneous optimization of one set of r solutions), correspond to a “SVD by blocks” notion, by supposing each block depending on relative subspaces, rather than on two whole spaces as usual SVD does. The other functions, based on this notion, are relative to two column partitioned data matrices x and y defining two sets of subsets xi and yj of variables and amount to estimate a link between xi and yj for the pair (xi, yj) relatively to the links associated to all the other pairs.
1936 Multivariate Statistics copula Multivariate Dependence with Copulas Classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.
1937 Multivariate Statistics corpcor Efficient Estimation of Covariance and (Partial) Correlation Implements a James-Stein-type shrinkage estimator for the covariance matrix, with separate shrinkage for variances and correlations. The details of the method are explained in Schafer and Strimmer (2005) <doi:10.2202/1544-6115.1175> and Opgen-Rhein and Strimmer (2007) <doi:10.2202/1544-6115.1252>. The approach is both computationally as well as statistically very efficient, it is applicable to “small n, large p” data, and always returns a positive definite and well-conditioned covariance matrix. In addition to inferring the covariance matrix the package also provides shrinkage estimators for partial correlations and partial variances. The inverse of the covariance and correlation matrix can be efficiently computed, as well as any arbitrary power of the shrinkage correlation matrix. Furthermore, functions are available for fast singular value decomposition, for computing the pseudoinverse, and for checking the rank and positive definiteness of a matrix.
1938 Multivariate Statistics covRobust Robust Covariance Estimation via Nearest Neighbor Cleaning The cov.nnve() function implements robust covariance estimation by the nearest neighbor variance estimation (NNVE) method of Wang and Raftery (2002) <doi:10.1198/016214502388618780>.
1939 Multivariate Statistics cramer Multivariate Nonparametric Cramer-Test for the Two-Sample-Problem Provides R routine for the so called two-sample Cramer-Test. This nonparametric two-sample-test on equality of the underlying distributions can be applied to multivariate data as well as univariate data. It offers two possibilities to approximate the critical value both of which are included in this package.
1940 Multivariate Statistics cwhmisc Miscellaneous Functions for Math, Plotting, Printing, Statistics, Strings, and Tools Miscellaneous useful or interesting functions.
1941 Multivariate Statistics delt Estimation of Multivariate Densities Using Adaptive Partitions We implement methods for estimating multivariate densities. We include a discretized kernel estimator, an adaptive histogram (a greedy histogram and a CART-histogram), stagewise minimization, and bootstrap aggregation.
1942 Multivariate Statistics denpro Visualization of Multivariate Functions, Sets, and Data We provide tools to (1) visualize multivariate density functions and density estimates with level set trees, (2) visualize level sets with shape trees, (3) visualize multivariate data with tail trees, (4) visualize scales of multivariate density estimates with mode graphs and branching maps, and (5) visualize anisotropic spread with 2D volume functions and 2D probability content functions. Level set trees visualize mode structure, shape trees visualize shapes of level sets of unimodal densities, and tail trees visualize connected data sets. The kernel estimator is implemented but the package may also be applied for visualizing other density estimates.
1943 Multivariate Statistics desirability Function Optimization and Ranking via Desirability Functions S3 classes for multivariate optimization using the desirability function by Derringer and Suich (1980).
1944 Multivariate Statistics dr Methods for Dimension Reduction for Regression Functions, methods, and datasets for fitting dimension reduction regression, using slicing (methods SAVE and SIR), Principal Hessian Directions (phd, using residuals and the response), and an iterative IRE. Partial methods, that condition on categorical predictors are also available. A variety of tests, and stepwise deletion of predictors, is also included. Also included is code for computing permutation tests of dimension. Adding additional methods of estimating dimension is straightforward. For documentation, see the vignette in the package. With version 3.0.4, the arguments for dr.step have been modified.
1945 Multivariate Statistics e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
1946 Multivariate Statistics earth Multivariate Adaptive Regression Splines Build regression models using the techniques in Friedman’s papers “Fast MARS” and “Multivariate Adaptive Regression Splines” <doi:10.1214/aos/1176347963>. (The term “MARS” is trademarked and thus not used in the name of the package.)
1947 Multivariate Statistics ellipse Functions for Drawing Ellipses and Ellipse-Like Confidence Regions Contains various routines for drawing ellipses and ellipse-like confidence regions, implementing the plots described in Murdoch and Chow (1996), A graphical display of large correlation matrices, The American Statistician 50, 178-180. There are also routines implementing the profile plots described in Bates and Watts (1988), Nonlinear Regression Analysis and its Applications.
1948 Multivariate Statistics energy E-Statistics: Multivariate Inference via the Energy of Data E-statistics (energy) tests and statistics for multivariate and univariate inference, including distance correlation, one-sample, two-sample, and multi-sample tests for comparing multivariate distributions, are implemented. Measuring and testing multivariate independence based on distance correlation, partial distance correlation, multivariate goodness-of-fit tests, k-groups and hierarchical clustering based on energy distance, testing for multivariate normality, distance components (disco) for non-parametric analysis of structured data, and other energy statistics/methods are implemented.
1949 Multivariate Statistics eRm Extended Rasch Modeling Fits Rasch models (RM), linear logistic test models (LLTM), rating scale model (RSM), linear rating scale models (LRSM), partial credit models (PCM), and linear partial credit models (LPCM). Missing values are allowed in the data matrix. Additional features are the ML estimation of the person parameters, Andersen’s LR-test, item-specific Wald test, Martin-Loef-Test, nonparametric Monte-Carlo Tests, itemfit and personfit statistics including infit and outfit measures, ICC and other plots, automated stepwise item elimination, simulation module for various binary data matrices.
1950 Multivariate Statistics FactoMineR Multivariate Exploratory Data Analysis and Data Mining Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017) <doi:10.1201/b10345-2>.
1951 Multivariate Statistics fastICA FastICA Algorithms to Perform ICA and Projection Pursuit Implementation of FastICA algorithm to perform Independent Component Analysis (ICA) and Projection Pursuit.
1952 Multivariate Statistics feature Local Inferential Feature Significance for Multivariate Kernel Density Estimation Local inferential feature significance for multivariate kernel density estimation.
1953 Multivariate Statistics fgac Generalized Archimedean Copula Bi-variate data fitting is done by two stochastic components: the marginal distributions and the dependency structure. The dependency structure is modeled through a copula. An algorithm was implemented considering seven families of copulas (Generalized Archimedean Copulas), the best fitting can be obtained looking all copula’s options (totally positive of order 2 and stochastically increasing models).
1954 Multivariate Statistics fpc Flexible Procedures for Clustering Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther’s prediction strength, Fang and Wang’s bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.
1955 Multivariate Statistics fso Fuzzy Set Ordination Fuzzy set ordination is a multivariate analysis used in ecology to relate the composition of samples to possible explanatory variables. While differing in theory and method, in practice, the use is similar to ‘constrained ordination.’ The package contains plotting and summary functions as well as the analyses.
1956 Multivariate Statistics gclus Clustering Graphics Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.
1957 Multivariate Statistics GenKern Functions for generating and manipulating binned kernel density estimates Computes generalised KDEs
1958 Multivariate Statistics geometry Mesh Generation and Surface Tessellation Makes the ‘Qhull’ library <http://www.qhull.org> available in R, in a similar manner as in Octave and MATLAB. Qhull computes convex hulls, Delaunay triangulations, halfspace intersections about a point, Voronoi diagrams, furthest-site Delaunay triangulations, and furthest-site Voronoi diagrams. It runs in 2D, 3D, 4D, and higher dimensions. It implements the Quickhull algorithm for computing the convex hull. Qhull does not support constrained Delaunay triangulations, or mesh generation of non-convex objects, but the package does include some R functions that allow for this.
1959 Multivariate Statistics geozoo Zoo of Geometric Objects Geometric objects defined in ‘geozoo’ can be simulated or displayed in the R package ‘tourr’.
1960 Multivariate Statistics gmodels Various R Programming Tools for Model Fitting Various R programming tools for model fitting.
1961 Multivariate Statistics GPArotation GPA Factor Rotation Gradient Projection Algorithm Rotation for Factor Analysis. See ?GPArotation.Intro for more details.
1962 Multivariate Statistics hddplot Use Known Groups in High-Dimensional Data to Derive Scores for Plots Cross-validated linear discriminant calculations determine the optimum number of features. Test and training scores from successive cross-validation steps determine, via a principal components calculation, a low-dimensional global space onto which test scores are projected, in order to plot them. Further functions are included that are intended for didactic use. The package implements, and extends, methods described in J.H. Maindonald and C.J. Burden (2005) <https://journal.austms.org.au/V46/CTAC2004/Main/home.html>.
1963 Multivariate Statistics Hmisc Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
1964 Multivariate Statistics homals Gifi Methods for Optimal Scaling Performs a homogeneity analysis (multiple correspondence analysis) and various extensions. Rank restrictions on the category quantifications can be imposed (nonlinear PCA). The categories are transformed by means of optimal scaling with options for nominal, ordinal, and numerical scale levels (for rank-1 restrictions). Variables can be grouped into sets, in order to emulate regression analysis and canonical correlation analysis.
1965 Multivariate Statistics hybridHclust Hybrid Hierarchical Clustering Hybrid hierarchical clustering via mutual clusters. A mutual cluster is a set of points closer to each other than to all other points. Mutual clusters are used to enrich top-down hierarchical clustering.
1966 Multivariate Statistics ICS Tools for Exploring Multivariate Data via ICS/ICA Implementation of Tyler, Critchley, Duembgen and Oja’s (JRSS B, 2009, <doi:10.1111/j.1467-9868.2009.00706.x>) and Oja, Sirkia and Eriksson’s (AJS, 2006, <http://www.ajs.or.at/index.php/ajs/article/view/vol35,%20no2%263%20-%207>) method of two different scatter matrices to obtain an invariant coordinate system or independent components, depending on the underlying assumptions.
1967 Multivariate Statistics ICSNP Tools for Multivariate Nonparametrics Tools for multivariate nonparametrics, as location tests based on marginal ranks, spatial median and spatial signs computation, Hotelling’s T-test, estimates of shape are implemented.
1968 Multivariate Statistics iplots iPlots - interactive graphics for R Interactive plots for R.
1969 Multivariate Statistics JADE Blind Source Separation Methods Based on Joint Diagonalization and Some BSS Performance Criteria Cardoso’s JADE algorithm as well as his functions for joint diagonalization are ported to R. Also several other blind source separation (BSS) methods, like AMUSE and SOBI, and some criteria for performance evaluation of BSS algorithms, are given. The package is described in Miettinen, Nordhausen and Taskinen (2017) <doi:10.18637/jss.v076.i02>.
1970 Multivariate Statistics kernlab Kernel-Based Machine Learning Lab Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
1971 Multivariate Statistics KernSmooth Functions for Kernel Smoothing Supporting Wand & Jones (1995) Functions for kernel smoothing (and density estimation) corresponding to the book: Wand, M.P. and Jones, M.C. (1995) “Kernel Smoothing”.
1972 Multivariate Statistics kknn Weighted k-Nearest Neighbors Weighted k-Nearest Neighbors for Classification, Regression and Clustering.
1973 Multivariate Statistics klaR Classification and Visualization Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to ‘svmlight’ and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.
1974 Multivariate Statistics knncat Nearest-neighbor Classification with Categorical Variables Scale categorical variables in such a way as to make NN classification as accurate as possible. The code also handles continuous variables and prior probabilities, and does intelligent variable selection and estimation of both error rates and the right number of NN’s.
1975 Multivariate Statistics kohonen Supervised and Unsupervised Self-Organising Maps Functions to train self-organising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.
1976 Multivariate Statistics ks Kernel Smoothing Kernel smoothers for univariate and multivariate data, including densities, density derivatives, cumulative distributions, clustering, classification, density ridges, significant modal regions, and two-sample hypothesis tests. Chacon & Duong (2018) <doi:10.1201/9780429485572>.
1977 Multivariate Statistics lattice Trellis Graphics for R A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.
1978 Multivariate Statistics ltm Latent Trait Models under IRT Analysis of multivariate dichotomous and polytomous data using latent trait models under the Item Response Theory approach. It includes the Rasch, the Two-Parameter Logistic, the Birnbaum’s Three-Parameter, the Graded Response, and the Generalized Partial Credit Models.
1979 Multivariate Statistics mAr Multivariate AutoRegressive analysis R functions for multivariate autoregressive analysis
1980 Multivariate Statistics MASS (core) Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
1981 Multivariate Statistics Matrix Sparse and Dense Matrix Classes and Methods A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using ‘LAPACK’ and ‘SuiteSparse’ libraries.
1982 Multivariate Statistics matrixcalc Collection of functions for matrix calculations A collection of functions to support matrix calculations for probability, econometric and numerical analysis. There are additional functions that are comparable to APL functions which are useful for actuarial models such as pension mathematics. This package is used for teaching and research purposes at the Department of Finance and Risk Engineering, New York University, Polytechnic Institute, Brooklyn, NY 11201.
1983 Multivariate Statistics mclust Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
1984 Multivariate Statistics MCMCpack Markov Chain Monte Carlo (MCMC) Package Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
1985 Multivariate Statistics mda Mixture and Flexible Discriminant Analysis Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, …
1986 Multivariate Statistics mice Multivariate Imputation by Chained Equations Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
1987 Multivariate Statistics misc3d Miscellaneous 3D Plots A collection of miscellaneous 3d plots, including isosurfaces.
1988 Multivariate Statistics mitools Tools for Multiple Imputation of Missing Data Tools to perform analyses and combine results from multiple-imputation datasets.
1989 Multivariate Statistics mix Estimation/Multiple Imputation for Mixed Categorical and Continuous Data Estimation/multiple imputation programs for mixed categorical and continuous data.
1990 Multivariate Statistics mnormt The Multivariate Normal and t Distributions Functions are provided for computing the density and the distribution function of multivariate normal and “t” random variables, and for generating random vectors sampled from these distributions. Probabilities are computed via non-Monte Carlo methods; different routines are used in the case d=1, d=2, d>2, if d denotes the number of dimensions.
1991 Multivariate Statistics MNP R Package for Fitting the Multinomial Probit Model Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. <doi:10.18637/jss.v014.i03>.
1992 Multivariate Statistics monomvn Estimation for Multivariate Normal and Student-t Data with Monotone Missingness Estimation of multivariate normal and student-t data of arbitrary dimension where the pattern of missing data is monotone. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided.
1993 Multivariate Statistics mvnmle ML Estimation for Multivariate Normal Data with Missing Values Finds the maximum likelihood estimate of the mean vector and variance-covariance matrix for multivariate normal data with missing values.
1994 Multivariate Statistics mvnormtest Normality test for multivariate variables Generalization of shapiro-wilk test for multivariate variables.
1995 Multivariate Statistics mvoutlier Multivariate Outlier Detection Based on Robust Methods Various Methods for Multivariate Outlier Detection.
1996 Multivariate Statistics mvtnorm Multivariate Normal and t Distributions Computes multivariate normal and t probabilities, quantiles, random deviates and densities.
1997 Multivariate Statistics nFactors Parallel Analysis and Non Graphical Solutions to the Cattell Scree Test Indices, heuristics and strategies to help determine the number of factors/components to retain: 1. Acceleration factor (af with or without Parallel Analysis); 2. Optimal Coordinates (noc with or without Parallel Analysis); 3. Parallel analysis (components, factors and bootstrap); 4. lambda > mean(lambda) (Kaiser, CFA and related); 5. Cattell-Nelson-Gorsuch (CNG); 6. Zoski and Jurs multiple regression (b, t and p); 7. Zoski and Jurs standard error of the regression coeffcient (sescree); 8. Nelson R2; 9. Bartlett khi-2; 10. Anderson khi-2; 11. Lawley khi-2 and 12. Bentler-Yuan khi-2.
1998 Multivariate Statistics pan Multiple Imputation for Multivariate Panel or Clustered Data It provides functions and examples for maximum likelihood estimation for generalized linear mixed models and Gibbs sampler for multivariate linear mixed models with incomplete data, as described in Schafer JL (1997) “Imputation of missing covariates under a multivariate linear mixed model”. Technical report 97-04, Dept. of Statistics, The Pennsylvania State University.
1999 Multivariate Statistics paran Horn’s Test of Principal Components/Factors An implementation of Horn’s technique for numerically and graphically evaluating the components or factors retained in a principle components analysis (PCA) or common factor analysis (FA). Horn’s method contrasts eigenvalues produced through a PCA or FA on a number of random data sets of uncorrelated variables with the same number of variables and observations as the experimental or observational data set to produce eigenvalues for components or factors that are adjusted for the sample error-induced inflation. Components with adjusted eigenvalues greater than one are retained. paran may also be used to conduct parallel analysis following Glorfeld’s (1995) suggestions to reduce the likelihood of over-retention.
2000 Multivariate Statistics party A Laboratory for Recursive Partytioning A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/1471-2105-8-25>.
2001 Multivariate Statistics pcaPP Robust PCA by Projection Pursuit Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) <doi:10.2139/ssrn.968376>, Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/978-3-642-33042-1_31>.
2002 Multivariate Statistics PearsonICA Independent component analysis using score functions from the Pearson system The Pearson-ICA algorithm is a mutual information-based method for blind separation of statistically independent source signals. It has been shown that the minimization of mutual information leads to iterative use of score functions, i.e. derivatives of log densities. The Pearson system allows adaptive modeling of score functions. The flexibility of the Pearson system makes it possible to model a wide range of source distributions including asymmetric distributions. The algorithm is designed especially for problems with asymmetric sources but it works for symmetric sources as well.
2003 Multivariate Statistics pls Partial Least Squares and Principal Component Regression Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).
2004 Multivariate Statistics plsgenomics PLS Analyses for Genomics Routines for PLS-based genomic analyses, implementing PLS methods for classification with microarray data and prediction of transcription factor activities from combined ChIP-chip analysis. The >=1.2-1 versions include two new classification methods for microarray data: GSIM and Ridge PLS. The >=1.3 versions includes a new classification method combining variable selection and compression in logistic regression context: logit-SPLS; and an adaptive version of the sparse PLS.
2005 Multivariate Statistics poLCA Polytomous variable Latent Class Analysis Latent class analysis and latent class regression models for polytomous outcome variables. Also known as latent structure analysis.
2006 Multivariate Statistics polycor Polychoric and Polyserial Correlations Computes polychoric and polyserial correlations by quick “two-step” methods or ML, optionally with standard errors; tetrachoric and biserial correlations are special cases.
2007 Multivariate Statistics ppls Penalized Partial Least Squares Contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via cross-validation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.
2008 Multivariate Statistics prim Patient Rule Induction Method (PRIM) Patient Rule Induction Method (PRIM) for bump hunting in high-dimensional data.
2009 Multivariate Statistics proxy Distance and Similarity Measures Provides an extensible framework for the efficient calculation of auto- and cross-proximities, along with implementations of the most popular ones.
2010 Multivariate Statistics psy Various procedures used in psychometry Kappa, ICC, Cronbach alpha, screeplot, mtmm
2011 Multivariate Statistics PTAk Principal Tensor Analysis on k Modes A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting non-identity metrics and penalisations. 2-way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tucker-n) and PARAFAC/CANDECOMP with these extensions.
2012 Multivariate Statistics relaimpo Relative Importance of Regressors in Linear Models Provides several metrics for assessing relative importance in linear models. These can be printed, plotted and bootstrapped. The recommended metric is lmg, which provides a decomposition of the model explained variance into non-negative contributions. There is a version of this package available that additionally provides a new and also recommended metric called pmvd. If you are a non-US user, you can download this extended version from Ulrike Groempings web site.
2013 Multivariate Statistics rggobi Interface Between R and ‘GGobi’ A command-line interface to ‘GGobi’, an interactive and dynamic graphics package. ‘Rggobi’ complements the graphical user interface of ‘GGobi’ providing a way to fluidly transition between analysis and exploration, as well as automating common tasks.
2014 Multivariate Statistics rgl 3D Visualization Using OpenGL Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.
2015 Multivariate Statistics robustbase Basic Robust Statistics “Essential” Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book “Robust Statistics, Theory and Methods” by ‘Maronna, Martin and Yohai’; Wiley 2006.
2016 Multivariate Statistics ROCR Visualizing the Performance of Scoring Classifiers ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.
2017 Multivariate Statistics rpart Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
2018 Multivariate Statistics rrcov Scalable Robust Estimators with High Breakdown Point Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point.
2019 Multivariate Statistics scatterplot3d 3D Scatter Plot Plots a three dimensional (3D) point cloud.
2020 Multivariate Statistics sem Structural Equation Models Functions for fitting general linear structural equation models (with observed and latent variables) using the RAM approach, and for fitting structural equations in observed-variable models by two-stage least squares.
2021 Multivariate Statistics SensoMineR Sensory Data Analysis Statistical Methods to Analyse Sensory Data. SensoMineR: A package for sensory data analysis. S. Le and F. Husson (2008) <doi:10.1111/j.1745-459X.2007.00137.x>.
2022 Multivariate Statistics seriation Infrastructure for Ordering Objects Using Seriation Infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).
2023 Multivariate Statistics simba A Collection of functions for similarity analysis of vegetation data Besides functions for the calculation of similarity and multiple plot similarity measures with binary data (for instance presence/absence species data) the package contains some simple wrapper functions for reshaping species lists into matrices and vice versa and some other functions for further processing of similarity data (Mantel-like permutation procedures) as well as some other useful stuff for vegetation analysis.
2024 Multivariate Statistics smatr (Standardised) Major Axis Estimation and Testing Routines Methods for fitting bivariate lines in allometry using the major axis (MA) or standardised major axis (SMA), and for making inferences about such lines. The available methods of inference include confidence intervals and one-sample tests for slope and elevation, testing for a common slope or elevation amongst several allometric lines, constructing a confidence interval for a common slope or elevation, and testing for no shift along a common axis, amongst several samples. See Warton et al. 2012 <doi:10.1111/j.2041-210X.2011.00153.x> for methods description.
2025 Multivariate Statistics sn The Skew-Normal and Related Distributions Such as the Skew-t Build and manipulate probability distributions of the skew-normal family and some related ones, notably the skew-t family, and provide related statistical methods for data fitting and model diagnostics, in the univariate and the multivariate case.
2026 Multivariate Statistics spam SPArse Matrix Set of functions for sparse matrix algebra. Differences with other sparse matrix packages are: (1) we only support (essentially) one sparse matrix format, (2) based on transparent and simple structure(s), (3) tailored for MCMC calculations within G(M)RF. (4) and it is fast and scalable (with the extension package spam64).
2027 Multivariate Statistics SparseM Sparse Linear Algebra Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.
2028 Multivariate Statistics SpatialNP Multivariate Nonparametric Methods Based on Spatial Signs and Ranks Test and estimates of location, tests of independence, tests of sphericity and several estimates of shape all based on spatial signs, symmetrized signs, ranks and signed ranks. For details, see Oja and Randles (2004) <doi:10.1214/088342304000000558> and Oja (2010) <doi:10.1007/978-1-4419-0468-3>.
2029 Multivariate Statistics superpc Supervised principal components Supervised principal components for regression and survival analsysis. Especially useful for high-dimnesional data, including microarray data.
2030 Multivariate Statistics trimcluster Cluster Analysis with Trimming Trimmed k-means clustering.
2031 Multivariate Statistics tsfa Time Series Factor Analysis Extraction of Factors from Multivariate Time Series. See ?00tsfa-Intro for more details.
2032 Multivariate Statistics vcd Visualizing Categorical Data Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).
2033 Multivariate Statistics vegan (core) Community Ecology Package Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
2034 Multivariate Statistics VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
2035 Multivariate Statistics VIM Visualization and Imputation of Missing Values New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
2036 Multivariate Statistics xgobi Interface to the XGobi and XGvis programs for graphical data analysis Interface to the XGobi and XGvis programs for graphical data analysis.
2037 Multivariate Statistics YaleToolkit Data exploration tools from Yale University This collection of data exploration tools was developed at Yale University for the graphical exploration of complex multivariate data; barcode and gpairs now have their own packages. The new big.read.table() provided here may be useful for large files when only a subset is needed.
2038 Natural Language Processing alineR Alignment of Phonetic Sequences Using the ‘ALINE’ Algorithm Functions are provided to calculate the ‘ALINE’ Distance between words as per (Kondrak 2000) and (Downey, Hallmark, Cox, Norquest, & Lansing, 2008, <doi:10.1080/09296170802326681>). The score is based on phonetic features represented using the Unicode-compliant International Phonetic Alphabet (IPA). Parameterized features weights are used to determine the optimal alignment and functions are provided to estimate optimum values using a genetic algorithm and supervised learning. See (Downey, Sun, and Norquest 2017, <https://journal.r-project.org/archive/2017/RJ-2017-005/index.html>.
2039 Natural Language Processing boilerpipeR Interface to the Boilerpipe Java Library Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe (http://code.google.com/p/boilerpipe/) Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.
2040 Natural Language Processing corpora Statistics and Data Sets for Corpus Frequency Data Utility functions for the statistical analysis of corpus frequency data. This package is a companion to the open-source course “Statistical Inference: A Gentle Introduction for Computational Linguists and Similar Creatures” (‘SIGIL’).
2041 Natural Language Processing gsubfn Utilities for Strings and Function Arguments The gsubfn function is like gsub but can take a replacement function or certain other objects instead of the replacement string. Matches and back references are input to the replacement function and replaced by the function output. gsubfn can be used to split strings based on content rather than delimiters and for quasi-perl-style string interpolation. The package also has facilities for translating formulas to functions and allowing such formulas in function calls instead of functions. This can be used with R functions such as apply, sapply, lapply, optim, integrate, xyplot, Filter and any other function that expects another function as an input argument or functions like cat or sql calls that may involve strings where substitution is desirable. There is also a facility for returning multiple objects from functions and a version of transform that allows the RHS to refer to LHS used in the same transform.
2042 Natural Language Processing gutenbergr Download and Process Public Domain Works from Project Gutenberg Download and process public domain works in the Project Gutenberg collection <http://www.gutenberg.org/>. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.
2043 Natural Language Processing hunspell High-Performance Stemmer, Tokenizer, and Spell Checker Low level spell checker and morphological analyzer based on the famous ‘hunspell’ library <https://hunspell.github.io>. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the ‘spelling’ package which builds on this package to automate checking of files, documentation and vignettes in all common formats.
2044 Natural Language Processing kernlab Kernel-Based Machine Learning Lab Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
2045 Natural Language Processing KoNLP Korean NLP Package POS Tagger and Morphological Analyzer for Korean text based research. It provides tools for corpus linguistics research such as Keystroke converter, Hangul automata, Concordance, and Mutual Information. It also provides a convenient interface for users to apply, edit and add morphological dictionary selectively.
2046 Natural Language Processing koRpus An R Package for Text Analysis A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Note: For full functionality a local installation of TreeTagger is recommended. It is also recommended to not load this package directly, but by loading one of the available language support packages from the ‘l10n’ repository <https://undocumeantit.github.io/repos/l10n>. ‘koRpus’ also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, request features, or discuss the development of the package, please subscribe to the koRpus-dev mailing list (<http://korpusml.reaktanz.de>).
2047 Natural Language Processing languageR Analyzing Linguistic Data: A Practical Introduction to Statistics Data sets exemplifying statistical methods, and some facilitatory utility functions used in “Analyzing Linguistic Data: A practical introduction to statistics using R”, Cambridge University Press, 2008.
2048 Natural Language Processing lda Collapsed Gibbs Sampling Methods for Topic Models Implements latent Dirichlet allocation (LDA) and related models. This includes (but is not limited to) sLDA, corrLDA, and the mixed-membership stochastic blockmodel. Inference for all of these models is implemented via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing data typically used in topic models, as well as tools for examining posterior distributions are also included.
2049 Natural Language Processing lsa Latent Semantic Analysis The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
2050 Natural Language Processing monkeylearn Accesses the Monkeylearn API for Text Classifiers and Extractors Allows using some services of Monkeylearn <http://monkeylearn.com/> which is a Machine Learning platform on the cloud for text analysis (classification and extraction).
2051 Natural Language Processing movMF Mixtures of von Mises-Fisher Distributions Fit and simulate mixtures of von Mises-Fisher distributions.
2052 Natural Language Processing mscstexta4r R Client for the Microsoft Cognitive Services Text Analytics REST API R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
2053 Natural Language Processing mscsweblm4r R Client for the Microsoft Cognitive Services Web Language Model REST API R Client for the Microsoft Cognitive Services Web Language Model REST API, including Break Into Words, Calculate Conditional Probability, Calculate Joint Probability, Generate Next Words, and List Available Models. A valid account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
2054 Natural Language Processing openNLP Apache OpenNLP Tools Interface An interface to the Apache OpenNLP tools (version 1.5.3). The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text written in Java. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. See <http://opennlp.apache.org/> for more information.
2055 Natural Language Processing ore An R Interface to the Onigmo Regular Expression Library Provides an alternative to R’s built-in functionality for handling regular expressions, based on the Onigmo library. Offers first-class compiled regex objects, partial matching and function-based substitutions, amongst other features.
2056 Natural Language Processing phonics Phonetic Spelling Algorithms Provides a collection of phonetic algorithms including Soundex, Metaphone, NYSIIS, Caverphone, and others.
2057 Natural Language Processing phonics Phonetic Spelling Algorithms Provides a collection of phonetic algorithms including Soundex, Metaphone, NYSIIS, Caverphone, and others.
2058 Natural Language Processing qdap Bridging the Gap Between Qualitative Data and Quantitative Analysis Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. ‘qdap’ is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
2059 Natural Language Processing quanteda Quantitative Analysis of Textual Data A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and ngrams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
2060 Natural Language Processing RcmdrPlugin.temis Graphical Integrated Text Mining Solution An ‘R Commander’ plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, ‘Twitter’ queries, as well as from ‘Dow Jones Factiva’, ‘LexisNexis’, ‘Europresse’ and ‘Alceste’ files.
2061 Natural Language Processing rel Reliability Coefficients Derives point estimates with confidence intervals for Bennett et als S, Cohen’s kappa, Conger’s kappa, Fleiss’ kappa, Gwet’s AC, intraclass correlation coefficients, Krippendorff’s alpha, Scott’s pi, the standard error of measurement, and weighted kappa.
2062 Natural Language Processing RKEA R/KEA Interface An R interface to KEA (Version 5.0). KEA (for Keyphrase Extraction Algorithm) allows for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary. For more information see <http://www.nzdl.org/Kea/>.
2063 Natural Language Processing RWeka R/Weka Interface An R interface to Weka (Version 3.9.3). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Package ‘RWeka’ contains the interface code, the Weka jar is in a separate package ‘RWekajars’. For more information on Weka see <http://www.cs.waikato.ac.nz/ml/weka/>.
2064 Natural Language Processing skmeans Spherical k-Means Clustering Algorithms to compute spherical k-means partitions. Features several methods, including a genetic and a fixed-point algorithm and an interface to the CLUTO vcluster program.
2065 Natural Language Processing SnowballC Snowball Stemmers Based on the C ‘libstemmer’ UTF-8 Library An R interface to the C ‘libstemmer’ library that implements Porter’s word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.
2066 Natural Language Processing stm Estimation of the Structural Topic Model The Structural Topic Model (STM) allows researchers to estimate topic models with document-level covariates. The package also includes tools for model selection, visualization, and estimation of topic-covariate regressions. Methods developed in Roberts et al (2014) <doi:10.1111/ajps.12103> and Roberts et al (2016) <doi:10.1080/01621459.2016.1141684>.
2067 Natural Language Processing stringdist Approximate String Matching and String Distance Functions Implements an approximate string matching version of R’s native ‘match’ function. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using ‘openMP’. An API for C or C++ is exposed as well.
2068 Natural Language Processing stringi Character String Processing Facilities Fast, correct, consistent, portable, as well as convenient character string/text processing in every locale and any native encoding. Owing to the use of the ‘ICU’ (International Components for Unicode) library, the package provides ‘R’ users with platform-independent functions known to ‘Java’, ‘Perl’, ‘Python’, ‘PHP’, and ‘Ruby’ programmers. Available features include: pattern searching (e.g., with ‘Java’-like regular expressions or the ‘Unicode’ collation algorithm), random string generation, case mapping, string transliteration, concatenation, Unicode normalization, date-time formatting and parsing, and many more.
2069 Natural Language Processing tau Text Analysis Utilities Utilities for text analysis.
2070 Natural Language Processing tesseract Open Source OCR Engine Bindings to ‘Tesseract’ <https://opensource.google.com/projects/tesseract>: a powerful optical character recognition (OCR) engine that supports over 100 languages. The engine is highly configurable in order to tune the detection algorithms and obtain the best possible results.
2071 Natural Language Processing text2vec Modern Text Mining Framework for R Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
2072 Natural Language Processing textcat N-Gram Based Text Categorization Text categorization based on n-grams.
2073 Natural Language Processing textir Inverse Regression for Text Analysis Multinomial (inverse) regression inference for text documents and associated attributes. For details see: Taddy (2013 JASA) Multinomial Inverse Regression for Text Analysis <arXiv:1012.2098> and Taddy (2015, AoAS), Distributed Multinomial Regression, <arXiv:1311.6139>. A minimalist partial least squares routine is also included. Note that the topic modeling capability of earlier ‘textir’ is now a separate package, ‘maptpx’.
2074 Natural Language Processing textrank Summarize Text by Ranking Sentences and Finding Keywords The ‘textrank’ algorithm is an extension of the ‘Pagerank’ algorithm for text. The algorithm allows to summarize text by calculating how sentences are related to one another. This is done by looking at overlapping terminology used in sentences in order to set up links between sentences. The resulting sentence network is next plugged into the ‘Pagerank’ algorithm which identifies the most important sentences in your text and ranks them. In a similar way ‘textrank’ can also be used to extract keywords. A word network is constructed by looking if words are following one another. On top of that network the ‘Pagerank’ algorithm is applied to extract relevant words after which relevant words which are following one another are combined to get keywords. More information can be found in the paper from Mihalcea, Rada & Tarau, Paul (2004) <http://www.aclweb.org/anthology/W04-3252>.
2075 Natural Language Processing textreuse Detect Text Reuse and Document Similarity Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
2076 Natural Language Processing tidytext Text Mining using ‘dplyr’, ‘ggplot2’, and Other Tidy Tools Text mining for word processing and sentiment analysis using ‘dplyr’, ‘ggplot2’, and other tidy tools.
2077 Natural Language Processing tm (core) Text Mining Package A framework for text mining applications within R.
2078 Natural Language Processing tm.plugin.alceste Import texts from files in the Alceste format using the tm text mining framework This package provides a tm Source to create corpora from a corpus prepared in the format used by the Alceste application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables.
2079 Natural Language Processing tm.plugin.dc Text Mining Distributed Corpus Plug-In A plug-in for the text mining framework tm to support text mining in a distributed way. The package provides a convenient interface for handling distributed corpus objects based on distributed list objects.
2080 Natural Language Processing tm.plugin.europresse Import Articles from ‘Europresse’ Using the ‘tm’ Text Mining Framework Provides a ‘tm’ Source to create corpora from articles exported from the ‘Europresse’ content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages).
2081 Natural Language Processing tm.plugin.factiva Import Articles from ‘Factiva’ Using the ‘tm’ Text Mining Framework Provides a ‘tm’ Source to create corpora from articles exported from the Dow Jones ‘Factiva’ content provider as XML or HTML files. It is able to read both text content and meta-data information (including source, date, title, author, subject, geographical coverage, company, industry, and various provider-specific fields).
2082 Natural Language Processing tm.plugin.lexisnexis Import Articles from ‘LexisNexis’ Using the ‘tm’ Text Mining Framework Provides a ‘tm’ Source to create corpora from articles exported from the ‘LexisNexis’ content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.
2083 Natural Language Processing tm.plugin.mail Text Mining E-Mail Plug-in A plug-in for the tm text mining framework providing mail handling functionality.
2084 Natural Language Processing tm.plugin.webmining Retrieve Structured, Textual Data from Various Web Sources Facilitate text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even retrieves and extracts the text of the original text source.
2085 Natural Language Processing tokenizers Fast, Consistent Tokenization of Natural Language Text Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the ‘stringi’ and ‘Rcpp’ packages for fast yet correct tokenization in ‘UTF-8’.
2086 Natural Language Processing topicmodels Topic Models Provides an interface to the C code for Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM) by David M. Blei and co-authors and the C++ code for fitting LDA models using Gibbs sampling by Xuan-Hieu Phan and co-authors.
2087 Natural Language Processing udpipe Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the ‘UDPipe’ ‘NLP’ Toolkit This natural language processing toolkit provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’ and ‘dependency parsing’ of raw text. Next to text parsing, the package also allows you to train annotation models based on data of ‘treebanks’ in ‘CoNLL-U’ format as provided at <http://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: ‘Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe’, available at <doi:10.18653/v1/K17-3009>.
2088 Natural Language Processing wordcloud Word Clouds Functionality to create pretty word clouds, visualize differences and similarity between documents, and avoid over-plotting in scatter plots with text.
2089 Natural Language Processing wordnet WordNet Interface An interface to WordNet using the Jawbone Java API to WordNet. WordNet (<http://wordnet.princeton.edu/>) is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. Please note that WordNet(R) is a registered tradename. Princeton University makes WordNet available to research and commercial users free of charge provided the terms of their license (<http://wordnet.princeton.edu/wordnet/license/>) are followed, and proper reference is made to the project using an appropriate citation (<http://wordnet.princeton.edu/wordnet/citing-wordnet/>).
2090 Natural Language Processing zipfR Statistical Models for Word Frequency Distributions Statistical models and utilities for the analysis of word frequency distributions. The utilities include functions for loading, manipulating and visualizing word frequency data and vocabulary growth curves. The package also implements several statistical models for the distribution of word frequencies in a population. (The name of this package derives from the most famous word frequency distribution, Zipf’s law.)
2091 Numerical Mathematics ADPF Use Least Squares Polynomial Regression and Statistical Testing to Improve Savitzky-Golay This function takes a vector or matrix of data and smooths the data with an improved Savitzky Golay transform. The Savitzky-Golay method for data smoothing and differentiation calculates convolution weights using Gram polynomials that exactly reproduce the results of least-squares polynomial regression. Use of the Savitzky-Golay method requires specification of both filter length and polynomial degree to calculate convolution weights. For maximum smoothing of statistical noise in data, polynomials with low degrees are desirable, while a high polynomial degree is necessary for accurate reproduction of peaks in the data. Extension of the least-squares regression formalism with statistical testing of additional terms of polynomial degree to a heuristically chosen minimum for each data window leads to an adaptive-degree polynomial filter (ADPF). Based on noise reduction for data that consist of pure noise and on signal reproduction for data that is purely signal, ADPF performed nearly as well as the optimally chosen fixed-degree Savitzky-Golay filter and outperformed sub-optimally chosen Savitzky-Golay filters. For synthetic data consisting of noise and signal, ADPF outperformed both optimally chosen and sub-optimally chosen fixed-degree Savitzky-Golay filters. See Barak, P. (1995) <doi:10.1021/ac00113a006> for more information.
2092 Numerical Mathematics akima Interpolation of Irregularly and Regularly Spaced Data Several cubic spline interpolation methods of H. Akima for irregular and regular gridded data are available through this package, both for the bivariate case (irregular data: ACM 761, regular data: ACM 760) and univariate case (ACM 433 and ACM 697). Linear interpolation of irregular gridded data is also covered by reusing D. J. Renkas triangulation code which is part of Akimas Fortran code. A bilinear interpolator for regular grids was also added for comparison with the bicubic interpolator on regular grids.
2093 Numerical Mathematics appell Compute Appell’s F1 hypergeometric function This package wraps Fortran code by F. D. Colavecchia and G. Gasaneo for computing the Appell’s F1 hypergeometric function. Their program uses Fortran code by L. F. Shampine and H. A. Watts. Moreover, the hypergeometric function with complex arguments is computed with Fortran code by N. L. J. Michel and M. V. Stoitsov or with Fortran code by R. C. Forrey. See the function documentations for the references and please cite them accordingly.
2094 Numerical Mathematics arrangements Fast Generators and Iterators for Permutations, Combinations and Partitions Fast generators and iterators for permutations, combinations and partitions. The iterators allow users to generate arrangements in a memory efficient manner and the generated arrangements are in lexicographical (dictionary) order. Permutations and combinations can be drawn with/without replacement and support multisets. It has been demonstrated that ‘arrangements’ outperforms most of the existing packages of similar kind. Some benchmarks could be found at <https://randy3k.github.io/arrangements/articles/benchmark.html>.
2095 Numerical Mathematics BB Solving and Optimizing Large-Scale Nonlinear Systems Barzilai-Borwein spectral methods for solving nonlinear system of equations, and for optimizing nonlinear objective functions subject to simple constraints. A tutorial style introduction to this package is available in a vignette on the CRAN download page or, when the package is loaded in an R session, with vignette(“BB”).
2096 Numerical Mathematics Bessel Bessel Bessel Functions Computations and Approximations Bessel Function Computations for complex and real numbers; notably interfacing TOMS 644; approximations for large arguments, experiments, etc.
2097 Numerical Mathematics bigIntegerAlgos R Tool for Factoring Big Integers Features the multiple polynomial quadratic sieve algorithm for factoring large integers and a vectorized factoring function that returns the complete factorization of an integer. Utilizes the C library GMP (GNU Multiple Precision Arithmetic) and classes created by Antoine Lucas et al. found in the ‘gmp’ package.
2098 Numerical Mathematics Brobdingnag Very Large Numbers in R Handles very large numbers in R. Real numbers are held using their natural logarithms, plus a logical flag indicating sign. The package includes a vignette that gives a step-by-step introduction to using S4 methods.
2099 Numerical Mathematics chebpol Multivariate Interpolation Contains methods for creating multivariate/multidimensional interpolations of functions on a hypercube. If available through fftw3, the DCT-II/FFT is used to compute coefficients for a Chebyshev interpolation. Other interpolation methods for arbitrary Cartesian grids are also provided, a piecewise multilinear, and the Floater-Hormann barycenter method. For scattered data polyharmonic splines with a linear term is provided. The time-critical parts are written in C for speed. All interpolants are parallelized if used to evaluate more than one point.
2100 Numerical Mathematics combinat combinatorics utilities routines for combinatorics
2101 Numerical Mathematics conicfit Algorithms for Fitting Circles, Ellipses and Conics Based on the Work by Prof. Nikolai Chernov Geometric circle fitting with Levenberg-Marquardt (a, b, R), Levenberg-Marquardt reduced (a, b), Landau, Spath and Chernov-Lesort. Algebraic circle fitting with Taubin, Kasa, Pratt and Fitzgibbon-Pilu-Fisher. Geometric ellipse fitting with ellipse LMG (geometric parameters) and conic LMA (algebraic parameters). Algebraic ellipse fitting with Fitzgibbon-Pilu-Fisher and Taubin.
2102 Numerical Mathematics contfrac Continued Fractions Various utilities for evaluating continued fractions.
2103 Numerical Mathematics cubature Adaptive Multivariate Integration over Hypercubes R wrappers around the cubature C library of Steven G. Johnson for adaptive multivariate integration over hypercubes and the Cuba C library of Thomas Hahn for deterministic and Monte Carlo integration. Scalar and vector interfaces for cubature and Cuba routines are provided; the vector interfaces are highly recommended as demonstrated in the package vignette.
2104 Numerical Mathematics Deriv (core) Symbolic Differentiation R-based solution for symbolic differentiation. It admits user-defined function as well as function substitution in arguments of functions to be differentiated. Some symbolic simplification is part of the work.
2105 Numerical Mathematics eigeninv Generates (dense) matrices that have a given set of eigenvalues Solves the “inverse eigenvalue problem” which is to generate a real-valued matrix that has the specified real eigenvalue spectrum. It can generate infinitely many dense matrices, symmetric or asymmetric, with the given set of eigenvalues. Algorithm can also generate stochastic and doubly stochastic matrices.
2106 Numerical Mathematics elliptic Weierstrass and Jacobi Elliptic Functions A suite of elliptic and related functions including Weierstrass and Jacobi forms. Also includes various tools for manipulating and visualizing complex functions.
2107 Numerical Mathematics expint Exponential Integral and Incomplete Gamma Function The exponential integrals E_1(x), E_2(x), E_n(x) and Ei(x), and the incomplete gamma function G(a, x) defined for negative values of its first argument. The package also gives easy access to the underlying C routines through an API; see the package vignette for details. A test package included in sub-directory example_API provides an implementation. C routines derived from the GNU Scientific Library <https://www.gnu.org/software/gsl/>.
2108 Numerical Mathematics expm Matrix Exponential, Log, ‘etc’ Computation of the matrix exponential, logarithm, sqrt, and related quantities.
2109 Numerical Mathematics fastGHQuad Fast ‘Rcpp’ Implementation of Gauss-Hermite Quadrature Fast, numerically-stable Gauss-Hermite quadrature rules and utility functions for adaptive GH quadrature. See Liu, Q. and Pierce, D. A. (1994) <doi:10.2307/2337136> for a reference on these methods.
2110 Numerical Mathematics feather R Bindings to the Feather ‘API’ Read and write feather files, a lightweight binary columnar data store designed for maximum speed.
2111 Numerical Mathematics features Feature Extraction for Discretely-Sampled Functional Data Discretely-sampled function is first smoothed. Features of the smoothed function are then extracted. Some of the key features include mean value, first and second derivatives, critical points (i.e. local maxima and minima), curvature of cunction at critical points, wiggliness of the function, noise in data, and outliers in data.
2112 Numerical Mathematics findpython Functions to Find an Acceptable Python Binary Package designed to find an acceptable python binary.
2113 Numerical Mathematics FixedPoint Algorithms for Finding Fixed Point Vectors of Functions For functions that take and return vectors (or scalars), this package provides 8 algorithms for finding fixed point vectors (vectors for which the inputs and outputs to the function are the same vector). These algorithms include Anderson (1965) acceleration <doi:10.1145/321296.321305>, epsilon extrapolation methods (Wynn 1962 <doi:10.2307/2004051>) and minimal polynomial methods (Cabay and Jackson 1976 <doi:10.1137/0713060>).
2114 Numerical Mathematics fourierin Computes Numeric Fourier Integrals Computes Fourier integrals of functions of one and two variables using the Fast Fourier transform. The Fourier transforms must be evaluated on a regular grid for fast evaluation.
2115 Numerical Mathematics freegroup The Free Group Provides functionality for manipulating elements of the free group (juxtaposition is represented by a plus) including inversion, multiplication by a scalar, group-theoretic power operation, and Tietze forms. The package is fully vectorized.
2116 Numerical Mathematics gaussquad Collection of functions for Gaussian quadrature A collection of functions to perform Gaussian quadrature with different weight functions corresponding to the orthogonal polynomials in package orthopolynom. Examples verify the orthogonality and inner products of the polynomials.
2117 Numerical Mathematics geigen Calculate Generalized Eigenvalues, the Generalized Schur Decomposition and the Generalized Singular Value Decomposition of a Matrix Pair with Lapack Functions to compute generalized eigenvalues and eigenvectors, the generalized Schur decomposition and the generalized Singular Value Decomposition of a matrix pair, using Lapack routines.
2118 Numerical Mathematics gmp Multiple Precision Arithmetic Multiple Precision Arithmetic (big integers and rationals, prime number tests, matrix computation), “arithmetic without limitations” using the C library GMP (GNU Multiple Precision Arithmetic).
2119 Numerical Mathematics gsl Wrapper for the Gnu Scientific Library An R wrapper for some of the functionality of the Gnu Scientific Library.
2120 Numerical Mathematics hypergeo The Gauss Hypergeometric Function The Gaussian hypergeometric function for complex numbers.
2121 Numerical Mathematics interp Interpolation Methods Bivariate data interpolation on regular and irregular grids, either linear or using splines are the main part of this package. It is intended to provide FOSS replacement functions for the ACM licensed akima::interp and tripack::tri.mesh functions. Currently the piecewise linear interpolation part of akima::interp (and also akima::interpp) is implemented in interp::interp, this corresponds to the call akima::interp(…, linear=TRUE) which is the default setting and covers most of akima::interp use cases in depending packages. A re-implementation of Akimas spline interpolation (akima::interp(…, linear=FALSE)) is currently under development and will complete this package in a later version. Estimators for partial derivatives are already available, these are a prerequisite for the spline interpolation. The basic part is currently a GPLed triangulation algorithm (sweep hull algorithm by David Sinclair) providing the starting point for the piecewise linear interpolator. As side effect this algorithm is also used to provide replacements for the basic functions of the tripack package which also suffer from the ACM restrictions. All functions are designed to be backward compatible with their akima / tripack counterparts.
2122 Numerical Mathematics irlba Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices Fast and memory efficient methods for truncated singular value decomposition and principal components analysis of large sparse and dense matrices.
2123 Numerical Mathematics JuliaCall Seamless Integration Between R and ‘Julia’ Provides an R interface to ‘Julia’, which is a high-level, high-performance dynamic programming language for numerical computing, see <https://julialang.org/> for more information. It provides a high-level interface as well as a low-level interface. Using the high level interface, you could call any ‘Julia’ function just like any R function with automatic type conversion. Using the low level interface, you could deal with C-level SEXP directly while enjoying the convenience of using a high-level programming language like ‘Julia’.
2124 Numerical Mathematics ktsolve Configurable Function for Solving Families of Nonlinear Equations This is designed for use with an arbitrary set of equations with an arbitrary set of unknowns. The user selects “fixed” values for enough unknowns to leave as many variables as there are equations, which in most cases means the system is properly defined and a unique solution exists. The function, the fixed values and initial values for the remaining unknowns are fed to a nonlinear backsolver. The original version of “TK!Solver” , now a product of Universal Technical Systems (<https://www.uts.com>) was the inspiration for this function.
2125 Numerical Mathematics lamW Lambert-W Function Implements both real-valued branches of the Lambert-W function, also known as the product logarithm, without the need for installing the entire GSL.
2126 Numerical Mathematics logOfGamma Natural Logarithms of the Gamma Function for Large Values Uses approximations to compute the natural logarithm of the Gamma function for large values.
2127 Numerical Mathematics m2r Macaulay2 in R Persistent interface to Macaulay2 (<http://www.math.uiuc.edu/Macaulay2/>) and front-end tools facilitating its use in the R ecosystem.
2128 Numerical Mathematics magic Create and Investigate Magic Squares A collection of efficient, vectorized algorithms for the creation and investigation of magic squares and hypercubes, including a variety of functions for the manipulation and analysis of arbitrarily dimensioned arrays. The package includes methods for creating normal magic squares of any order greater than 2. The ultimate intention is for the package to be a computerized embodiment all magic square knowledge, including direct numerical verification of properties of magic squares (such as recent results on the determinant of odd-ordered semimagic squares). Some antimagic functionality is included. The package also serves as a rebuttal to the often-heard comment “I thought R was just for statistics”.
2129 Numerical Mathematics MASS Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
2130 Numerical Mathematics matlab MATLAB emulation package Emulate MATLAB code using R
2131 Numerical Mathematics Matrix (core) Sparse and Dense Matrix Classes and Methods A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using ‘LAPACK’ and ‘SuiteSparse’ libraries.
2132 Numerical Mathematics matrixcalc Collection of functions for matrix calculations A collection of functions to support matrix calculations for probability, econometric and numerical analysis. There are additional functions that are comparable to APL functions which are useful for actuarial models such as pension mathematics. This package is used for teaching and research purposes at the Department of Finance and Risk Engineering, New York University, Polytechnic Institute, Brooklyn, NY 11201.
2133 Numerical Mathematics MonoPoly Functions to Fit Monotone Polynomials Functions for fitting monotone polynomials to data. Detailed discussion of the methodologies used can be found in Murray, Mueller and Turlach (2013) <doi:10.1007/s00180-012-0390-5> and Murray, Mueller and Turlach (2016) <doi:10.1080/00949655.2016.1139582>.
2134 Numerical Mathematics mpoly Symbolic Computation and More with Multivariate Polynomials Symbolic computing with multivariate polynomials in R.
2135 Numerical Mathematics multipol Multivariate Polynomials Various utilities to manipulate multivariate polynomials.
2136 Numerical Mathematics mvp Fast Symbolic Multivariate Polynomials Fast manipulation of symbolic multivariate polynomials using the ‘Map’ class of the Standard Template Library. The package uses print and coercion methods from the ‘mpoly’ package (Kahle 2013, “Multivariate polynomials in R”. The R Journal, 5(1):162), but offers speed improvements. It is comparable in speed to the ‘spray’ package for sparse arrays, but retains the symbolic benefits of ‘mpoly’.
2137 Numerical Mathematics mvQuad Methods for Multivariate Quadrature Provides methods to construct multivariate grids, which can be used for multivariate quadrature. This grids can be based on different quadrature rules like Newton-Cotes formulas (trapezoidal-, Simpson’s- rule, …) or Gauss quadrature (Gauss-Hermite, Gauss-Legendre, …). For the construction of the multidimensional grid the product-rule or the combination- technique can be applied.
2138 Numerical Mathematics nleqslv Solve Systems of Nonlinear Equations Solve a system of nonlinear equations using a Broyden or a Newton method with a choice of global strategies such as line search and trust region. There are options for using a numerical or user supplied Jacobian, for specifying a banded numerical Jacobian and for allowing a singular or ill-conditioned Jacobian.
2139 Numerical Mathematics numbers Number-Theoretic Functions Provides number-theoretic functions for factorization, prime numbers, twin primes, primitive roots, modular logarithm and inverses, extended GCD, Farey series and continuous fractions. Includes Legendre and Jacobi symbols, some divisor functions, Euler’s Phi function, etc.
2140 Numerical Mathematics numDeriv (core) Accurate Numerical Derivatives Methods for calculating (usually) accurate numerical first and second order derivatives. Accurate calculations are done using ‘Richardson”s’ extrapolation or, when applicable, a complex step derivative is available. A simple difference method is also provided. Simple difference is (usually) less accurate but is much quicker than ‘Richardson”s’ extrapolation and provides a useful cross-check. Methods are provided for real scalar and vector valued functions.
2141 Numerical Mathematics onion Octonions and Quaternions Quaternions and Octonions are four- and eight- dimensional extensions of the complex numbers. They are normed division algebras over the real numbers and find applications in spatial rotations (quaternions) and string theory and relativity (octonions). The quaternions are noncommutative and the octonions nonassociative. See RKS Hankin 2006, Rnews Volume 6/2: 49-51, and the package vignette, for more details.
2142 Numerical Mathematics optR Optimization Toolbox for Solving Linear Systems Solves linear systems of form Ax=b via Gauss elimination, LU decomposition, Gauss-Seidel, Conjugate Gradient Method (CGM) and Cholesky methods.
2143 Numerical Mathematics orthopolynom Collection of functions for orthogonal and orthonormal polynomials A collection of functions to construct sets of orthogonal polynomials and their recurrence relations. Additional functions are provided to calculate the derivative, integral, value and roots of lists of polynomial objects.
2144 Numerical Mathematics Pade Pade Approximant Coefficients Given a vector of Taylor series coefficients of sufficient length as input, the function returns the numerator and denominator coefficients for the Pade approximant of appropriate order.
2145 Numerical Mathematics partitions Additive Partitions of Integers Additive partitions of integers. Enumerates the partitions, unequal partitions, and restricted partitions of an integer; the three corresponding partition functions are also given. Set partitions are now included.
2146 Numerical Mathematics permutations The Symmetric Group: Permutations of a Finite Set Manipulates invertible functions from a finite set to itself. Can transform from word form to cycle form and back.
2147 Numerical Mathematics polyCub Cubature over Polygonal Domains Numerical integration of continuously differentiable functions f(x,y) over simple closed polygonal domains. The following cubature methods are implemented: product Gauss cubature (Sommariva and Vianello, 2007, <doi:10.1007/s10543-007-0131-2>), the simple two-dimensional midpoint rule (wrapping ‘spatstat’ functions), adaptive cubature for radially symmetric functions via line integrate() along the polygon boundary (Meyer and Held, 2014, <doi:10.1214/14-AOAS743>, Supplement B), and integration of the bivariate Gaussian density based on polygon triangulation. For simple integration along the axes, the ‘cubature’ package is more appropriate.
2148 Numerical Mathematics polynom (core) A Collection of Functions to Implement a Class for Univariate Polynomial Manipulations A collection of functions to implement a class for univariate polynomial manipulations.
2149 Numerical Mathematics PolynomF Polynomials in R Implements univariate polynomial operations in R, including polynomial arithmetic, finding zeros, plotting, and some operations on lists of polynomials.
2150 Numerical Mathematics pracma (core) Practical Numerical Math Functions Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some well-known special mathematical functions. Uses ‘MATLAB’ function names where appropriate to simplify porting.
2151 Numerical Mathematics PRIMME Eigenvalues and Singular Values and Vectors from Large Matrices R interface to PRIMME, a C library for computing a few eigenvalues and their corresponding eigenvectors of a real symmetric or complex Hermitian matrix. It can also compute singular values and vectors of a square or rectangular matrix. It can find largest, smallest, or interior singular/eigenvalues and can use preconditioning to accelerate convergence.
2152 Numerical Mathematics PythonInR Use ‘Python’ from Within ‘R’ Interact with ‘Python’ <https://www.python.org/> from within ‘R’.
2153 Numerical Mathematics QZ Generalized Eigenvalues and QZ Decomposition Generalized eigenvalues and QZ decomposition (generalized Schur form) for an N-by-N non-symmetric matrix A or paired matrices (A,B) with eigenvalues reordering mechanism. The package is mainly based complex*16 and double precision of LAPACK library (version 3.4.2.)
2154 Numerical Mathematics R.matlab Read and Write MAT Files and Call MATLAB from Within R Methods readMat() and writeMat() for reading and writing MAT files. For user with MATLAB v6 or newer installed (either locally or on a remote host), the package also provides methods for controlling MATLAB (trademark) via R and sending and retrieving data between R and MATLAB.
2155 Numerical Mathematics rARPACK Solvers for Large Scale Eigenvalue and SVD Problems Previously an R wrapper of the ‘ARPACK’ library <http://www.caam.rice.edu/software/ARPACK/>, and now a shell of the R package ‘RSpectra’, an R interface to the ‘Spectra’ library <http://yixuan.cos.name/spectra/> for solving large scale eigenvalue/vector problems. The current version of ‘rARPACK’ simply imports and exports the functions provided by ‘RSpectra’. New users of ‘rARPACK’ are advised to switch to the ‘RSpectra’ package.
2156 Numerical Mathematics Rcpp Seamless R and C++ Integration The ‘Rcpp’ package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of third-party libraries. Documentation about ‘Rcpp’ is provided by several vignettes included in this package, via the ‘Rcpp Gallery’ site at <http://gallery.rcpp.org>, the paper by Eddelbuettel and Francois (2011, <doi:10.18637/jss.v040.i08>), the book by Eddelbuettel (2013, <doi:10.1007/978-1-4614-6868-4>) and the paper by Eddelbuettel and Balamuta (2018, <doi:10.1080/00031305.2017.1375990>); see ‘citation(“Rcpp”)’ for details.
2157 Numerical Mathematics RcppAlgos High Performance Tools for Combinatorics and Computational Mathematics Provides optimized functions implemented in C++ with ‘Rcpp’ for solving problems in combinatorics and computational mathematics. Utilizes parallel programming via ‘RcppThread’ for maximal performance. Also makes use of the RMatrix class from the ‘RcppParallel’ library. There are combination/permutation functions with constraint parameters that allow for generation of all combinations/permutations of a vector meeting specific criteria (e.g. finding all combinations such that the sum is between two bounds). Capable of generating specific combinations/permutations (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library ‘libdivide’ by <http://ridiculousfish.com>. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomas Oliveira. Finally, there is a prime counting function that implements Legendre’s formula based on the algorithm by Kim Walisch.
2158 Numerical Mathematics RcppArmadillo ‘Rcpp’ Integration for the ‘Armadillo’ Templated Linear Algebra Library ‘Armadillo’ is a templated C++ linear algebra library (by Conrad Sanderson) that aims towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. The ‘RcppArmadillo’ package includes the header files from the templated ‘Armadillo’ library. Thus users do not need to install ‘Armadillo’ itself in order to use ‘RcppArmadillo’. From release 7.800.0 on, ‘Armadillo’ is licensed under Apache License 2; previous releases were under licensed as MPL 2.0 from version 3.800.0 onwards and LGPL-3 prior to that; ‘RcppArmadillo’ (the ‘Rcpp’ bindings/bridge to Armadillo) is licensed under the GNU GPL version 2 or later, as is the rest of ‘Rcpp’. Note that Armadillo requires a fairly recent compiler; for the g++ family at least version 4.6.* is required.
2159 Numerical Mathematics RcppEigen ‘Rcpp’ Integration for the ‘Eigen’ Templated Linear Algebra Library R and ‘Eigen’ integration using ‘Rcpp’. ‘Eigen’ is a C++ template library for linear algebra: matrices, vectors, numerical solvers and related algorithms. It supports dense and sparse matrices on integer, floating point and complex numbers, decompositions of such matrices, and solutions of linear systems. Its performance on many algorithms is comparable with some of the best implementations based on ‘Lapack’ and level-3 ‘BLAS’. The ‘RcppEigen’ package includes the header files from the ‘Eigen’ C++ template library (currently version 3.3.4). Thus users do not need to install ‘Eigen’ itself in order to use ‘RcppEigen’. Since version 3.1.1, ‘Eigen’ is licensed under the Mozilla Public License (version 2); earlier version were licensed under the GNU LGPL version 3 or later. ‘RcppEigen’ (the ‘Rcpp’ bindings/bridge to ‘Eigen’) is licensed under the GNU GPL version 2 or later, as is the rest of ‘Rcpp’.
2160 Numerical Mathematics reticulate Interface to ‘Python’ Interface to ‘Python’ modules, classes, and functions. When calling into ‘Python’, R data types are automatically converted to their equivalent ‘Python’ types. When values are returned from ‘Python’ to R they are converted back to R types. Compatible with all versions of ‘Python’ >= 2.7.
2161 Numerical Mathematics Rlinsolve Iterative Solvers for (Sparse) Linear System of Equations Solving a system of linear equations is one of the most fundamental computational problems for many fields of mathematical studies, such as regression problems from statistics or numerical partial differential equations. We provide basic stationary iterative solvers such as Jacobi, Gauss-Seidel, Successive Over-Relaxation and SSOR methods. Nonstationary, also known as Krylov subspace methods are also provided. Sparse matrix computation is also supported in that solving large and sparse linear systems can be manageable using ‘Matrix’ package along with ‘RcppArmadillo’. For a more detailed description, see a book by Saad (2003) <doi:10.1137/1.9780898718003>.
2162 Numerical Mathematics Rmpfr R MPFR - Multiple Precision Floating-Point Reliable Arithmetic (via S4 classes and methods) for arbitrary precision floating point numbers, including transcendental (“special”) functions. To this end, the package interfaces to the ‘LGPL’ licensed ‘MPFR’ (Multiple Precision Floating-Point Reliable) Library which itself is based on the ‘GMP’ (GNU Multiple Precision) Library.
2163 Numerical Mathematics rmumps Wrapper for MUMPS Library Some basic features of MUMPS (Multifrontal Massively Parallel sparse direct Solver) are wrapped in a class whose methods can be used for sequentially solving a sparse linear system (symmetric or not) with one or many right hand sides (dense or sparse). There is a possibility to do separately symbolic analysis, LU (or LDL^t) factorization and system solving. Third part ordering libraries are included and can be used: PORD, METIS, SCOTCH. MUMPS method was first described in Amestoy et al. (2001) <doi:10.1137/S0895479899358194> and Amestoy et al. (2006) <doi:10.1016/j.parco.2005.07.004>.
2164 Numerical Mathematics RootsExtremaInflections Finds Roots, Extrema and Inflection Points of a Curve Implementation of the Taylor Regression Estimator method which is described in Christopoulos (2014,<https://www.researchgate.net/publication/261562841>) for finding the root, extreme or inflection point of a curve, when we only have a set of probably noisy xy points for it. The method uses a suitable polynomial regression in order to find the coefficients of the relevant Taylor polynomial for the function that has generated our data. Optional use of parallel computing under request.
2165 Numerical Mathematics rPython Package Allowing R to Call Python Run Python code, make function calls, assign and retrieve variables, etc. from R.
2166 Numerical Mathematics Rserve Binary R server Rserve acts as a socket server (TCP/IP or local sockets) which allows binary requests to be sent to R. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++ and Java, allowing any application to use facilities of R without the need of linking to R code. Rserve supports remote connection, user authentication and file transfer. A simple R client is included in this package as well.
2167 Numerical Mathematics RSpectra Solvers for Large-Scale Eigenvalue and SVD Problems R interface to the ‘Spectra’ library <https://spectralib.org/> for large-scale eigenvalue and SVD problems. It is typically used to compute a few eigenvalues/vectors of an n by n matrix, e.g., the k largest eigenvalues, which is usually more efficient than eigen() if k << n. This package provides the ‘eigs()’ function that does the similar job as in ‘Matlab’, ‘Octave’, ‘Python SciPy’ and ‘Julia’. It also provides the ‘svds()’ function to calculate the largest k singular values and corresponding singular vectors of a real matrix. The matrix to be computed on can be dense, sparse, or in the form of an operator defined by the user.
2168 Numerical Mathematics rSymPy R Interface to SymPy Computer Algebra System Access SymPy computer algebra system from R via Jython.
2169 Numerical Mathematics Ryacas R Interface to the Yacas Computer Algebra System An interface to the yacas computer algebra system.
2170 Numerical Mathematics schumaker Schumaker Shape-Preserving Spline This is a shape preserving spline <doi:10.1137/0720057> which is guaranteed to be monotonic and concave or convex if the data is monotonic and concave or convex. It does not use any optimisation and is therefore quick and smoothly converges to a fixed point in economic dynamics problems including value function iteration. It also automatically gives the first two derivatives of the spline and options for determining behaviour when evaluated outside the interpolation domain.
2171 Numerical Mathematics signal Signal Processing A set of signal processing functions originally written for ‘Matlab’ and ‘Octave’. Includes filter generation utilities, filtering functions, resampling routines, and visualization of filter models. It also includes interpolation functions.
2172 Numerical Mathematics SimplicialCubature Integration of Functions Over Simplices Provides methods to integrate functions over m-dimensional simplices in n-dimensional Euclidean space. There are exact methods for polynomials and adaptive methods for integrating an arbitrary function. Dirichlet probabilities are calculated in certain cases.
2173 Numerical Mathematics SnakeCharmR R and Python Integration Run ‘Python’ code, make function calls, assign and retrieve variables, etc. from R. A fork from ‘rPython’ which uses ‘jsonlite’, ‘Rcpp’ and has several fixes and improvements.
2174 Numerical Mathematics SparseGrid Sparse grid integration in R SparseGrid is a package to create sparse grids for numerical integration, based on code from www.sparse-grids.de
2175 Numerical Mathematics SparseM Sparse Linear Algebra Some basic linear algebra functionality for sparse matrices is provided: including Cholesky decomposition and backsolving as well as standard R subsetting and Kronecker products.
2176 Numerical Mathematics SphericalCubature Numerical Integration over Spheres and Balls in n-Dimensions; Multivariate Polar Coordinates Provides several methods to integrate functions over the unit sphere and ball in n-dimensional Euclidean space. Routines for converting to/from multivariate polar/spherical coordinates are also provided.
2177 Numerical Mathematics ssvd Sparse SVD Fast iterative thresholding sparse SVD, together with an initialization algorithm
2178 Numerical Mathematics statmod Statistical Modeling A collection of algorithms and functions to aid statistical modeling. Includes growth curve comparisons, limiting dilution analysis (aka ELDA), mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Includes advanced generalized linear model functions that implement secure convergence, dispersion modeling and Tweedie power-law families.
2179 Numerical Mathematics stinepack Stineman, a Consistently Well Behaved Method of Interpolation A consistently well behaved method of interpolation based on piecewise rational functions using Stineman’s algorithm.
2180 Numerical Mathematics svd Interfaces to Various State-of-Art SVD and Eigensolvers R bindings to SVD and eigensolvers (PROPACK, nuTRLan).
2181 Numerical Mathematics tripack Triangulation of Irregularly Spaced Data A constrained two-dimensional Delaunay triangulation package providing both triangulation and generation of voronoi mosaics of irregular spaced data.
2182 Numerical Mathematics VeryLargeIntegers Store and Operate with Arbitrarily Large Integers Multi-precission library that allows to store and operate with arbitrarily big integers without loss of precision. It includes a large list of tools to work with them, like: - Arithmetic and logic operators - Modular-arithmetic operators - Computer Number Theory utilities - Probabilistic primality tests - Factorization algorithms - Random generators of diferent types of integers.
2183 Numerical Mathematics XR A Structure for Interfaces from R Support for interfaces from R to other languages, built around a class for evaluators and a combination of functions, classes and methods for communication. Will be used through a specific language interface package. Described in the book “Extending R”.
2184 Numerical Mathematics XRJulia Structured Interface to Julia A Julia interface structured according to the general form described in package ‘XR’ and in the book “Extending R”.
2185 Numerical Mathematics XRPython Structured Interface to ‘Python’ A ‘Python’ interface structured according to the general form described in package ‘XR’ and in the book “Extending R”.
2186 Numerical Mathematics Zseq Integer Sequence Generator Generates well-known integer sequences. ‘gmp’ package is adopted for computing with arbitrarily large numbers. Every function has hyperlink to its corresponding item in OEIS (The On-Line Encyclopedia of Integer Sequences) in the function help page. For interested readers, see Sloane and Plouffe (1995, ISBN:978-0125586306).
2187 Official Statistics & Survey Methodology acs Download, Manipulate, and Present American Community Survey and Decennial Data from the US Census Provides a general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census (<https://www.census.gov/data/developers/data-sets.html>), including SF1 (Decennial short-form), SF3 (Decennial long-form), and the American Community Survey (ACS). Confidence intervals provided with ACS data are converted to standard errors to be bundled with estimates in complex acs objects. Package provides new methods to conduct standard operations on acs objects and present/plot data in statistically appropriate ways.
2188 Official Statistics & Survey Methodology Amelia A Program for Missing Data A tool that “multiply imputes” missing data in a single cross-section (such as a survey), from a time series (like variables collected for each year in a country), or from a time-series-cross-sectional data set (such as collected by years for each of several countries). Amelia II implements our bootstrapping-based algorithm that gives essentially the same answers as the standard IP or EMis approaches, is usually considerably faster than existing approaches and can handle many more variables. Unlike Amelia I and other statistically rigorous imputation software, it virtually never crashes (but please let us know if you find to the contrary!). The program also generalizes existing approaches by allowing for trends in time series across observations within a cross-sectional unit, as well as priors that allow experts to incorporate beliefs they have about the values of missing cells in their data. Amelia II also includes useful diagnostics of the fit of multiple imputation models. The program works from the R command line or via a graphical user interface that does not require users to know R.
2189 Official Statistics & Survey Methodology anesrake ANES Raking Implementation Provides a comprehensive system for selecting variables and weighting data to match the specifications of the American National Election Studies. The package includes methods for identifying discrepant variables, raking data, and assessing the effects of the raking algorithm. It also allows automated re-raking if target variables fall outside identified bounds and allows greater user specification than other available raking algorithms. A variety of simple weighted statistics that were previously in this package (version .55 and earlier) have been moved to the package ‘weights.’
2190 Official Statistics & Survey Methodology BalancedSampling Balanced and Spatially Balanced Sampling Select balanced and spatially balanced probability samples in multi-dimensional spaces with any prescribed inclusion probabilities. It contains fast (C++ via Rcpp) implementations of the included sampling methods. The local pivotal method and spatially correlated Poisson sampling (for spatially balanced sampling) are included. Also the cube method (for balanced sampling) and the local cube method (for doubly balanced sampling) are included.
2191 Official Statistics & Survey Methodology BayesSAE Bayesian Analysis of Small Area Estimation Provides a variety of methods from Rao (2003, ISBN:0-471-41374-7) and some other research articles to deal with several specific small area area- level models in Bayesian framework. Models provided range from the basic Fay-Herriot model to its improvement such as You-Chapman models, unmatched models, spatial models and so on. Different types of priors for specific parameters could be chosen to obtain MCMC posterior draws. The main sampling function is written in C with GSL lab so as to facilitate the computation. Model internal checking and model comparison criteria are also involved.
2192 Official Statistics & Survey Methodology BIFIEsurvey Tools for Survey Statistics in Educational Assessment Contains tools for survey statistics (especially in educational assessment) for datasets with replication designs (jackknife, bootstrap, replicate weights; see Kolenikov, 2010; Pfefferman & Rao, 2009a, 2009b, <doi:10.1016/S0169-7161(09)70003-3>, <doi:10.1016/S0169-7161(09)70037-9>); Shao, 1996, <doi:10.1080/02331889708802523>). Descriptive statistics, linear and logistic regression, path models for manifest variables with measurement error correction and two-level hierarchical regressions for weighted samples are included. Statistical inference can be conducted for multiply imputed datasets and nested multiply imputed datasets and is in particularly suited for the analysis of plausible values (for details see George, Oberwimmer & Itzlinger-Bruneforth, 2016; Bruneforth, Oberwimmer & Robitzsch, 2016; Robitzsch, Pham & Yanagida, 2016; <doi:10.17888/fdb-demo:bistE813I-16a>). The package development was supported by BIFIE (Federal Institute for Educational Research, Innovation and Development of the Austrian School System; Salzburg, Austria).
2193 Official Statistics & Survey Methodology CalibrateSSB Weighting and Estimation for Panel Data with Non-Response Functions to calculate weights, estimates of changes and corresponding variance estimates for panel data with non-response.
2194 Official Statistics & Survey Methodology cat Analysis of categorical-variable datasets with missing values Analysis of categorical-variable with missing values
2195 Official Statistics & Survey Methodology cbsodataR Statistics Netherlands (CBS) Open Data API Client The data and meta data from Statistics Netherlands (www.cbs.nl) can be browsed and downloaded. The client uses the open data API of Statistics Netherlands.
2196 Official Statistics & Survey Methodology censusapi Retrieve Data from the Census APIs A wrapper for the U.S. Census Bureau APIs that returns data frames of Census data and metadata. Available datasets include the Decennial Census, American Community Survey, Small Area Health Insurance Estimates, Small Area Income and Poverty Estimates, Population Estimates and Projections, and more.
2197 Official Statistics & Survey Methodology censusGeography Changes United States Census Geographic Code into Name of Location Converts the United States Census geographic code for city, state (FIP and ICP), region, and birthplace, into the name of the region. e.g. takes an input of Census city code 5330 to it’s actual city, Philadelphia. Will return NA for code that doesn’t correspond to real location.
2198 Official Statistics & Survey Methodology CoImp Copula Based Imputation Method Copula based imputation method. A semiparametric imputation procedure for missing multivariate data based on conditional copula specifications.
2199 Official Statistics & Survey Methodology convey Income Concentration Analysis with Complex Survey Samples Variance estimation on indicators of income concentration and poverty using complex sample survey designs. Wrapper around the survey package.
2200 Official Statistics & Survey Methodology deducorrect Deductive Correction, Deductive Imputation, and Deterministic Correction A collection of methods for automated data cleaning where all actions are logged.
2201 Official Statistics & Survey Methodology DHS.rates Calculates Demographic Indicators Calculates key indicators such as fertility rates (Total Fertility Rate (TFR), General Fertility Rate (GFR), and Age Specific Fertility Rate (ASFR)) using Demographic and Health Survey (DHS) women/individual data, and childhood mortality probabilities and rates such as Neonatal Mortality Rate (NNMR), Post-neonatal Mortality Rate (PNNMR), Infant Mortality Rate (IMR), Child Mortality Rate (CMR), and Under-five Mortality Rate (U5MR). In addition to the indicators, the ‘DHS.rates’ package estimates sampling errors indicators such as Standard Error (SE), Design Effect (DEFT), Relative Standard Error (RSE) and Confidence Interval (CI). The package is developed according to the DHS methodology of calculating the fertility indicators and the childhood mortality rates outlined in the “Guide to DHS Statistics” (Croft, Trevor N., Aileen M. J. Marshall, Courtney K. Allen, et al. 2018, <https://dhsprogram.com/Data/Guide-to-DHS-Statistics/index.cfm>) and the DHS methodology of estimating the sampling errors indicators outlined in the “DHS Sampling and Household Listing Manual” (ICF International 2012, <https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf>).
2202 Official Statistics & Survey Methodology easySdcTable Easy Interface to the Statistical Disclosure Control Package ‘sdcTable’ The main function, ProtectTable(), performs table suppression according to a frequency rule with a data set as the only required input. Within this function, protectTable(), protectLinkedTables() or runArgusBatchFile() in package ‘sdcTable’ is called. Lists of level-hierarchy (parameter ‘dimList’) and other required input to these functions are created automatically. The function, PTgui(), starts a graphical user interface based on the shiny package.
2203 Official Statistics & Survey Methodology editrules Parsing, Applying, and Manipulating Data Cleaning Rules Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt’s generalized principle. Rules dependencies can be visualized with using the ‘igraph’ package.
2204 Official Statistics & Survey Methodology emdi Estimating and Mapping Disaggregated Indicators Functions that support estimating, assessing and mapping regional disaggregated indicators. So far, estimation methods comprise direct estimation and the model-based approach Empirical Best Prediction (see “Small area estimation of poverty indicators” by Molina and Rao (2010) <doi:10.1002/cjs.10051>), as well as their precision estimates. The assessment of the used model is supported by a summary and diagnostic plots. For a suitable presentation of estimates, map plots can be easily created. Furthermore, results can easily be exported to excel.
2205 Official Statistics & Survey Methodology errorlocate Locate Errors with Validation Rules Errors in data can be located and removed using validation rules from package ‘validate’.
2206 Official Statistics & Survey Methodology eurostat Tools for Eurostat Open Data Tools to download data from the Eurostat database <http://ec.europa.eu/eurostat> together with search and manipulation utilities.
2207 Official Statistics & Survey Methodology extremevalues Univariate Outlier Detection Detect outliers in one-dimensional data.
2208 Official Statistics & Survey Methodology FFD Freedom from Disease Functions, S4 classes/methods and a graphical user interface (GUI) to design surveys to substantiate freedom from disease using a modified hypergeometric function (see Cameron and Baldock, 1997). Herd sensitivities are computed according to sampling strategies “individual sampling” or “limited sampling” (see M. Ziller, T. Selhorst, J. Teuffert, M. Kramer and H. Schlueter, 2002). Methods to compute the a-posteriori alpha-error are implemented. Risk-based targeted sampling is supported.
2209 Official Statistics & Survey Methodology foreign Read Data Stored by ‘Minitab’, ‘S’, ‘SAS’, ‘SPSS’, ‘Stata’, ‘Systat’, ‘Weka’, ‘dBase’, … Reading and writing data stored by some versions of ‘Epi Info’, ‘Minitab’, ‘S’, ‘SAS’, ‘SPSS’, ‘Stata’, ‘Systat’, ‘Weka’, and for reading and writing some ‘dBase’ files.
2210 Official Statistics & Survey Methodology Frames2 Estimation in Dual Frame Surveys Point and interval estimation in dual frame surveys. In contrast to classic sampling theory, where only one sampling frame is considered, dual frame methodology assumes that there are two frames available for sampling and that, overall, they cover the entire target population. Then, two probability samples (one from each frame) are drawn and information collected is suitably combined to get estimators of the parameter of interest.
2211 Official Statistics & Survey Methodology GeomComb (Geometric) Forecast Combination Methods Provides eigenvector-based (geometric) forecast combination methods; also includes simple approaches (simple average, median, trimmed and winsorized mean, inverse rank method) and regression-based combination. Tools for data pre-processing are available in order to deal with common problems in forecast combination (missingness, collinearity).
2212 Official Statistics & Survey Methodology gridsample Tools for Grid-Based Survey Sampling Design Multi-stage cluster surveys of households are commonly performed by governments and programmes to monitor population-level demographic, social, economic, and health outcomes. Generally, communities are sampled from subpopulations (strata) in a first stage, and then households are listed and sampled in a second stage. In this typical two-stage design, sampled communities are the Primary Sampling Units (PSUs) and households are the Secondary Sampling Units (SSUs). Census data typically serve as the sample frame from which PSUs are selected. However, if census data are outdated inaccurate, or too geographically course, gridded population data (such as <http://www.worldpop.org.uk>) can be used as a sample frame instead. GridSample (<doi:10.1186/s12942-017-0098-4>) generates PSUs from gridded population data according to user-specified complex survey design characteristics and household sample size. In gridded population sampling, like census sampling, PSUs are selected within each stratum using a serpentine sampling method, and can be oversampled in urban or rural areas to ensure a minimum sample size in each of these important sub-domains. Furthermore, because grid cells are uniform in size and shape, gridded population sampling allows for samples to be representative of both the population and of space, which is not possible with a census sample frame.
2213 Official Statistics & Survey Methodology haven Import and Export ‘SPSS’, ‘Stata’ and ‘SAS’ Files Import foreign statistical formats into R via the embedded ‘ReadStat’ C library, <https://github.com/WizardMac/ReadStat>.
2214 Official Statistics & Survey Methodology hbsae Hierarchical Bayesian Small Area Estimation Functions to compute small area estimates based on a basic area or unit-level model. The model is fit using restricted maximum likelihood, or in a hierarchical Bayesian way. In the latter case numerical integration is used to average over the posterior density for the between-area variance. The output includes the model fit, small area estimates and corresponding MSEs, as well as some model selection measures. Additional functions provide means to compute aggregate estimates and MSEs, to minimally adjust the small area estimates to benchmarks at a higher aggregation level, and to graphically compare different sets of small area estimates.
2215 Official Statistics & Survey Methodology Hmisc Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
2216 Official Statistics & Survey Methodology IC2 Inequality and Concentration Indices and Curves Lorenz and concentration curves; Atkinson, Generalized entropy and SGini indices (with decomposition)
2217 Official Statistics & Survey Methodology icarus Calibrates and Reweights Units in Samples Provides user-friendly tools for calibration in survey sampling. The package is production-oriented, and its interface is inspired by the famous popular macro ‘Calmar’ for SAS, so that ‘Calmar’ users can quickly get used to ‘icarus’. In addition to calibration (with linear, raking and logit methods), ‘icarus’ features functions for calibration on tight bounds and penalized calibration.
2218 Official Statistics & Survey Methodology idbr R Interface to the US Census Bureau International Data Base API Use R to make requests to the US Census Bureau’s International Data Base API. Results are returned as R data frames. For more information about the IDB API, visit <http://www.census.gov/data/developers/data-sets/international-database.html>.
2219 Official Statistics & Survey Methodology inegiR Integrate INEGI’s (Mexican Stats Office) API with R Provides functions to download and parse information from INEGI (Official Mexican statistics agency). To learn more about the API, see <http://www.inegi.org.mx/desarrolladores/default.aspx>.
2220 Official Statistics & Survey Methodology ineq Measuring Inequality, Concentration, and Poverty Inequality, concentration, and poverty measures. Lorenz curves (empirical and theoretical).
2221 Official Statistics & Survey Methodology ipumsr Read ‘IPUMS’ Extract Files An easy way to import census, survey and geographic data provided by ‘IPUMS’ into R plus tools to help use the associated metadata to make analysis easier. ‘IPUMS’ data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from our website <https://ipums.org>.
2222 Official Statistics & Survey Methodology JoSAE Unit-Level and Area-Level Small Area Estimation Implementation of some unit and area level EBLUP estimators as well as the estimators of their MSE also under heteroscedasticity. The package further documents the publications Breidenbach and Astrup (2012) <doi:10.1007/s10342-012-0596-7>, Breidenbach et al. (2016) <doi:10.1016/j.rse.2015.07.026> and Breidenbach et al. (2018 in press). The vignette further explains the use of the implemented functions.
2223 Official Statistics & Survey Methodology laeken Estimation of Indicators on Social Exclusion and Poverty Estimation of indicators on social exclusion and poverty, as well as Pareto tail modeling for empirical income distributions.
2224 Official Statistics & Survey Methodology lavaan Latent Variable Analysis Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
2225 Official Statistics & Survey Methodology lavaan.survey Complex Survey Structural Equation Modeling (SEM) Fit structural equation models (SEM) including factor analysis, multivariate regression models with latent variables and many other latent variable models while correcting estimates, standard errors, and chi-square-derived fit measures for a complex sampling design. Incorporate clustering, stratification, sampling weights, and finite population corrections into a SEM analysis. Wrapper around packages lavaan and survey.
2226 Official Statistics & Survey Methodology lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
2227 Official Statistics & Survey Methodology mapStats Geographic Display of Survey Data Statistics Automated calculation and visualization of survey data statistics on a color-coded map.
2228 Official Statistics & Survey Methodology MatchIt Nonparametric Preprocessing for Parametric Causal Inference Selects matched samples of the original treated and control groups with similar covariate distributions can be used to match exactly on covariates, to match on propensity scores, or perform a variety of other matching procedures. The package also implements a series of recommendations offered in Ho, Imai, King, and Stuart (2007) <doi:10.1093/pan/mpl013>.
2229 Official Statistics & Survey Methodology MBHdesign Spatial Designs for Ecological and Environmental Surveys Provides spatially balanced designs from a set of (contiguous) potential sampling locations in a study region. Accommodates , without detrimental effects on spatial balance, sites that the researcher wishes to include in the survey for reasons other than the current randomisation (legacy sites).
2230 Official Statistics & Survey Methodology memisc Management of Survey Data and Presentation of Analysis Results An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) ‘SPSS’ and ‘Stata’ files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to ‘LaTeX’ and HTML.
2231 Official Statistics & Survey Methodology mi Missing Data Imputation and Model Checking The mi package provides functions for data manipulation, imputing missing values in an approximate Bayesian framework, diagnostics of the models used to generate the imputations, confidence-building mechanisms to validate some of the assumptions of the imputation algorithm, and functions to analyze multiply imputed data sets with the appropriate degree of sampling uncertainty.
2232 Official Statistics & Survey Methodology mice Multivariate Imputation by Chained Equations Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
2233 Official Statistics & Survey Methodology micEconIndex Price and Quantity Indices Tools for calculating Laspeyres, Paasche, and Fisher price and quantity indices.
2234 Official Statistics & Survey Methodology MicSim Performing Continuous-Time Microsimulation This entry-level toolkit allows performing continuous-time microsimulation for a wide range of demographic applications. Individual life-courses are specified by a continuous-time multi-state model.
2235 Official Statistics & Survey Methodology MImix Mixture summary method for multiple imputation Tools to combine results for multiply-imputed data using mixture approximations
2236 Official Statistics & Survey Methodology mipfp Multidimensional Iterative Proportional Fitting and Alternative Models An implementation of the iterative proportional fitting (IPFP), maximum likelihood, minimum chi-square and weighted least squares procedures for updating a N-dimensional array with respect to given target marginal distributions (which, in turn can be multidimensional). The package also provides an application of the IPFP to simulate multivariate Bernoulli distributions.
2237 Official Statistics & Survey Methodology missForest Nonparametric Missing Value Imputation using Random Forest The function ‘missForest’ in this package is used to impute missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation. It can be run in parallel to save computation time.
2238 Official Statistics & Survey Methodology missMDA Handling Missing Values with Multivariate Data Analysis Imputation of incomplete continuous or categorical datasets; Missing values are imputed with a principal component analysis (PCA), a multiple correspondence analysis (MCA) model or a multiple factor analysis (MFA) model; Perform multiple imputation with and in PCA or MCA.
2239 Official Statistics & Survey Methodology mitools Tools for Multiple Imputation of Missing Data Tools to perform analyses and combine results from multiple-imputation datasets.
2240 Official Statistics & Survey Methodology mix Estimation/Multiple Imputation for Mixed Categorical and Continuous Data Estimation/multiple imputation programs for mixed categorical and continuous data.
2241 Official Statistics & Survey Methodology nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
2242 Official Statistics & Survey Methodology noncensus U.S. Census Regional and Demographic Data A collection of various regional information determined by the U.S. Census Bureau along with demographic data.
2243 Official Statistics & Survey Methodology norm Analysis of multivariate normal datasets with missing values Analysis of multivariate normal datasets with missing values
2244 Official Statistics & Survey Methodology OECD Search and Extract Data from the OECD Search and extract data from the OECD.
2245 Official Statistics & Survey Methodology pan Multiple Imputation for Multivariate Panel or Clustered Data It provides functions and examples for maximum likelihood estimation for generalized linear mixed models and Gibbs sampler for multivariate linear mixed models with incomplete data, as described in Schafer JL (1997) “Imputation of missing covariates under a multivariate linear mixed model”. Technical report 97-04, Dept. of Statistics, The Pennsylvania State University.
2246 Official Statistics & Survey Methodology panelaggregation Aggregate Longitudinal Survey Data Aggregate Business Tendency Survey Data (and other qualitative surveys) to time series at various aggregation levels. Run aggregation of survey data in a speedy, re-traceable and a easily deployable way. Aggregation is substantially accelerated by use of data.table. This package intends to provide an interface that is less general and abstract than data.table but rather geared towards survey researchers.
2247 Official Statistics & Survey Methodology pps Functions for PPS sampling The pps package contains functions to select samples using PPS (probability proportional to size) sampling. It also includes a function for stratified simple random sampling, a function to compute joint inclusion probabilities for Sampford’s method of PPS sampling, and a few utility functions. The user’s guide pps-ug.pdf is included.
2248 Official Statistics & Survey Methodology PracTools Tools for Designing and Weighting Survey Samples Functions and datasets to support Valliant, Dever, and Kreuter, “Practical Tools for Designing and Weighting Survey Samples” (2nd edition, 2018). Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs. Other functions compute variance components for multistage designs and sample sizes in two-phase designs. A number of example data sets are included.
2249 Official Statistics & Survey Methodology prevR Estimating Regional Trends of a Prevalence from a DHS Spatial estimation of a prevalence surface or a relative risks surface, using data from a Demographic and Health Survey (DHS) or an analog survey.
2250 Official Statistics & Survey Methodology pxR PC-Axis with R Provides a set of functions for reading and writing PC-Axis files, used by different statistical organizations around the globe for data dissemination.
2251 Official Statistics & Survey Methodology quantification Quantification of Qualitative Survey Data Provides different functions for quantifying qualitative survey data. It supports the Carlson-Parkin method, the regression approach, the balance approach and the conditional expectations method.
2252 Official Statistics & Survey Methodology questionr Functions to Make Surveys Processing Easier Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
2253 Official Statistics & Survey Methodology RcmdrPlugin.sampling Tools for sampling in Official Statistical Surveys This package includes tools for calculating sample sizes and selecting samples using various sampling designs. This package is an extension of RcmdrPlugin.EHESsampling which was developed as part of the EHES pilot project. The EHES Pilot project has received funding from the European Commission and DG Sanco. The views expressed here are those of the authors and they do not represent Commission’s official position.
2254 Official Statistics & Survey Methodology RecordLinkage Record Linkage in R Provides functions for linking and de-duplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain.
2255 Official Statistics & Survey Methodology reweight Adjustment of Survey Respondent Weights Adjusts the weights of survey respondents so that the marginal distributions of certain variables fit more closely to those from a more precise source (e.g. Census Bureau’s data).
2256 Official Statistics & Survey Methodology Rilostat ILO Open Data via Ilostat Bulk Download Facility or SDMX Web Service Tools to download data from the ilostat database <http://www.ilo.org/ilostat> together with search and manipulation utilities.
2257 Official Statistics & Survey Methodology robCompositions Compositional Data Analysis Methods for analysis of compositional data including robust methods, imputation, methods to replace rounded zeros, (robust) outlier detection for compositional data, (robust) principal component analysis for compositional data, (robust) factor analysis for compositional data, (robust) discriminant analysis for compositional data (Fisher rule), robust regression with compositional predictors and (robust) Anderson-Darling normality tests for compositional data as well as popular log-ratio transformations (addLR, cenLR, isomLR, and their inverse transformations). In addition, visualisation and diagnostic tools are implemented as well as high and low-level plot functions for the ternary diagram.
2258 Official Statistics & Survey Methodology rpms Recursive Partitioning for Modeling Survey Data Fits a linear model to survey data in each node obtained by recursively partitioning the data. The splitting variables and splits selected are obtained using a procedure which adjusts for complex sample design features used to obtain the data. Likewise the model fitting algorithm produces design-consistent coefficients to the least squares linear model between the dependent and independent variables. The first stage of the design is accounted for in the provided variance estimates. The main function returns the resulting binary tree with the linear model fit at every end-node. The package provides a number of functions and methods for these trees.
2259 Official Statistics & Survey Methodology rrcov3way Robust Methods for Multiway Data Analysis, Applicable also for Compositional Data Provides methods for multiway data analysis by means of Parafac and Tucker 3 models. Robust versions (Engelen and Hubert (2011) <doi:10.1016/j.aca.2011.04.043>) and versions for compositional data are also provided (Gallo (2015) <doi:10.1080/03610926.2013.798664>, Di Palma et al. (in press)).
2260 Official Statistics & Survey Methodology rrcovNA Scalable Robust Estimators with High Breakdown Point for Incomplete Data Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point for Incomplete Data.
2261 Official Statistics & Survey Methodology RRreg Correlation and Regression Analyses for Randomized Response Data Univariate and multivariate methods to analyze randomized response (RR) survey designs (e.g., Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 6369, <doi:10.2307/2283137>). Besides univariate estimates of true proportions, RR variables can be used for correlations, as dependent variable in a logistic regression (with or without random effects), or as predictors in a linear regression (Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression analyses of randomized response data. Journal of Statistical Software, 85(2), 129, <doi:10.18637/jss.v085.i02>). For simulations and the estimation of statistical power, RR data can be generated according to several models. The implemented methods also allow to test the link between continuous covariates and dishonesty in cheating paradigms such as the coin-toss or dice-roll task (Moshagen, M., & Hilbig, B. E. (2017). The statistical analysis of cheating paradigms. Behavior Research Methods, 49, 724732, <doi:10.3758/s13428-016-0729-x>).
2262 Official Statistics & Survey Methodology RRTCS Randomized Response Techniques for Complex Surveys Point and interval estimation of linear parameters with data obtained from complex surveys (including stratified and clustered samples) when randomization techniques are used. The randomized response technique was developed to obtain estimates that are more valid when studying sensitive topics. Estimators and variances for 14 randomized response methods for qualitative variables and 7 randomized response methods for quantitative variables are also implemented. In addition, some data sets from surveys with these randomization methods are included in the package.
2263 Official Statistics & Survey Methodology rsae Robust Small Area Estimation Robust Small Area Estimation. Robust Basic Unit- and Area-Level Models
2264 Official Statistics & Survey Methodology rspa Adapt Numerical Records to Fit (in)Equality Restrictions Minimally adjust the values of numerical records in a data.frame, such that each record satisfies a predefined set of equality and/or inequality constraints. The constraints can be defined using the ‘validate’ package. The core algorithms have recently been moved to the ‘lintools’ package, refer to ‘lintools’ for a more basic interface and access to a version of the algorithm that works with sparse matrices.
2265 Official Statistics & Survey Methodology rworldmap Mapping Global Data Enables mapping of country level and gridded user datasets.
2266 Official Statistics & Survey Methodology sae Small Area Estimation Functions for small area estimation.
2267 Official Statistics & Survey Methodology saeSim Simulation Tools for Small Area Estimation Tools for the simulation of data in the context of small area estimation. Combine all steps of your simulation - from data generation over drawing samples to model fitting - in one object. This enables easy modification and combination of different scenarios. You can store your results in a folder or start the simulation in parallel.
2268 Official Statistics & Survey Methodology sampling Survey Sampling Functions for drawing and calibrating samples.
2269 Official Statistics & Survey Methodology samplingbook Survey Sampling Procedures Sampling procedures from the book ‘Stichproben - Methoden und praktische Umsetzung mit R’ by Goeran Kauermann and Helmut Kuechenhoff (2010).
2270 Official Statistics & Survey Methodology SamplingStrata Optimal Stratification of Sampling Frames for Multipurpose Sampling Surveys In the field of stratified sampling design, this package offers an approach for the determination of the best stratification of a sampling frame, the one that ensures the minimum sample cost under the condition to satisfy precision constraints in a multivariate and multidomain case. This approach is based on the use of the genetic algorithm: each solution (i.e. a particular partition in strata of the sampling frame) is considered as an individual in a population; the fitness of all individuals is evaluated applying the Bethel-Chromy algorithm to calculate the sampling size satisfying precision constraints on the target estimates. Functions in the package allows to: (a) analyse the obtained results of the optimisation step; (b) assign the new strata labels to the sampling frame; (c) select a sample from the new frame accordingly to the best allocation. Functions for the execution of the genetic algorithm are a modified version of the functions in the ‘genalg’ package.
2271 Official Statistics & Survey Methodology samplingVarEst Sampling Variance Estimation Functions to calculate some point estimators and estimating their variance under unequal probability sampling without replacement. Single and two stage sampling designs are considered. Some approximations for the second order inclusion probabilities are also available (sample and population based). A variety of Jackknife variance estimators are implemented. Almost every function is written in C (compiled) code for faster results. The functions incorporate some performance improvements for faster results with large datasets.
2272 Official Statistics & Survey Methodology SAScii Import ASCII files directly into R using only a SAS input script Using any importation code designed for SAS users to read ASCII files into sas7bdat files, the SAScii package parses through the INPUT block of a (.sas) syntax file to design the parameters needed for a read.fwf function call. This allows the user to specify the location of the ASCII (often a .dat) file and the location of the .sas syntax file, and then load the data frame directly into R in just one step.
2273 Official Statistics & Survey Methodology SDaA Sampling: Design and Analysis Functions and Datasets from Lohr, S. (1999), Sampling: Design and Analysis, Duxbury.
2274 Official Statistics & Survey Methodology sdcHierarchies Create and (Interactively) Modify Nested Hierarchies Provides functionality to generate, (interactively) modify (by adding, removing and renaming nodes) and convert nested hierarchies between different formats. These tree like structures can be used to define for example complex hierarchical tables used for statistical disclosure control.
2275 Official Statistics & Survey Methodology sdcMicro Statistical Disclosure Control Methods for Anonymization of Microdata and Risk Estimation Data from statistical agencies and other institutions are mostly confidential. This package can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. In addition, various risk estimation methods are included. Note that the package includes a graphical user interface that allows to use various methods of this package.
2276 Official Statistics & Survey Methodology sdcTable Methods for Statistical Disclosure Control in Tabular Data Methods for statistical disclosure control in tabular data such as primary and secondary cell suppression as described for example in Hundepol et al. (2012) <doi:10.1002/9781118348239> are covered in this package.
2277 Official Statistics & Survey Methodology seasonal R Interface to X-13-ARIMA-SEATS Easy-to-use interface to X-13-ARIMA-SEATS, the seasonal adjustment software by the US Census Bureau. It offers full access to almost all options and outputs of X-13, including X-11 and SEATS, automatic ARIMA model search, outlier detection and support for user defined holiday variables, such as Chinese New Year or Indian Diwali. A graphical user interface can be used through the ‘seasonalview’ package. Uses the X-13-binaries from the ‘x13binary’ package.
2278 Official Statistics & Survey Methodology SeleMix Selective Editing via Mixture Models Detection of outliers and influential errors using a latent variable model.
2279 Official Statistics & Survey Methodology simFrame Simulation framework A general framework for statistical simulation.
2280 Official Statistics & Survey Methodology simPop Simulation of Synthetic Populations for Survey Data Considering Auxiliary Information Tools and methods to simulate populations for surveys based on auxiliary data. The tools include model-based methods, calibration and combinatorial optimization algorithms. The package was developed with support of the International Household Survey Network, DFID Trust Fund TF011722 and funds from the World bank.
2281 Official Statistics & Survey Methodology simputation Simple Imputation Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the ‘magrittr’ package.
2282 Official Statistics & Survey Methodology SmallCountRounding Small Count Rounding of Tabular Data A statistical disclosure control tool to protect frequency tables in cases where small values are sensitive. The function PLSrounding() performs small count rounding of necessary inner cells so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed. Thus, additivity and consistency are guaranteed. The methodology is described in Langsrud and Heldal (2018) <https://www.researchgate.net/publication/327768398>.
2283 Official Statistics & Survey Methodology sms Spatial Microsimulation Produce small area population estimates by fitting census data to survey data.
2284 Official Statistics & Survey Methodology sorvi Finnish Open Government Data Toolkit Algorithms for Finnish open government data.
2285 Official Statistics & Survey Methodology spsurvey Spatial Survey Design and Analysis These functions provide procedures for selecting sites for spatial surveys using spatially balanced algorithms applied to discrete points, linear networks, or polygons. The probability survey designs available include independent random samples, stratified random samples, and unequal probability random samples (categorical or probability proportional to size). Design-based estimation based on the results from surveys is available for estimating totals, means, quantiles, CDFs, and linear models. The analyses rely on package survey for most results. Variance estimation options include a local neighborhood variance estimator that is appropriate for spatially-balanced survey designs. A reference for the survey design portion of the package is: D. L. Stevens, Jr. and A. R. Olsen (2004), “Spatially-balanced sampling of natural resources.”, Journal of the American Statistical Association 99(465): 262-278, <doi:10.1198/016214504000000250>. Additional helpful references for this package are A. R. Olsen, T. M. Kincaid, and Q. Payton (2012) and T. M. Kincaid and A. R. Olsen (2012), both of which are chapters in the book “Design and Analysis of Long-Term Ecological Monitoring Studies” (R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.), Cambridge University Press, New York, <Online ISBN:9781139022422>).
2286 Official Statistics & Survey Methodology srvyr ‘dplyr’-Like Syntax for Summary Statistics of Survey Data Use piping, verbs like ‘group_by’ and ‘summarize’, and other ‘dplyr’ inspired syntactic style when calculating summary statistics on survey data using functions from the ‘survey’ package.
2287 Official Statistics & Survey Methodology StatMatch Statistical Matching or Data Fusion Integration of two data sources referred to the same target population which share a number of variables. Some functions can also be used to impute missing values in data sets through hot deck imputation methods. Methods to perform statistical matching when dealing with data from complex sample surveys are available too.
2288 Official Statistics & Survey Methodology stratification Univariate Stratification of Survey Populations Univariate stratification of survey populations with a generalization of the Lavallee-Hidiroglou method of stratum construction. The generalized method takes into account a discrepancy between the stratification variable and the survey variable. The determination of the optimal boundaries also incorporate, if desired, an anticipated non-response, a take-all stratum for large units, a take-none stratum for small units, and a certainty stratum to ensure that some specific units are in the sample. The well known cumulative root frequency rule of Dalenius and Hodges and the geometric rule of Gunning and Horgan are also implemented.
2289 Official Statistics & Survey Methodology stringdist Approximate String Matching and String Distance Functions Implements an approximate string matching version of R’s native ‘match’ function. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using ‘openMP’. An API for C or C++ is exposed as well.
2290 Official Statistics & Survey Methodology survey (core) Analysis of Complex Survey Samples Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase subsampling designs. Graphics. PPS sampling without replacement. Principal components, factor analysis.
2291 Official Statistics & Survey Methodology surveybootstrap Tools for the Bootstrap with Survey Data Tools for using different kinds of bootstrap for estimating sampling variation using complex survey data.
2292 Official Statistics & Survey Methodology surveydata Tools to Work with Survey Data Data obtained from surveys contains information not only about the survey responses, but also the survey metadata, e.g. the original survey questions and the answer options. The ‘surveydata’ package makes it easy to keep track of this metadata, and to easily extract columns with specific questions.
2293 Official Statistics & Survey Methodology surveyoutliers Helps Manage Outliers in Sample Surveys At present, the only functionality is the calculation of optimal one-sided winsorizing cutoffs. The main function is optimal.onesided.cutoff.bygroup. It calculates the optimal tuning parameter for one-sided winsorisation, and so calculates winsorised values for a variable of interest. See the help file for this function for more details and an example.
2294 Official Statistics & Survey Methodology surveyplanning Survey Planning Tools Tools for sample survey planning, including sample size calculation, estimation of expected precision for the estimates of totals, and calculation of optimal sample size allocation.
2295 Official Statistics & Survey Methodology svyPVpack A package for complex surveys including plausible values This package deals with data which stem from survey designs including plausible values. This package has been created to handle data from Large Scale Assessments like PISA, PIAAC etc. which use complex survey designs to draw the sample and plausible values to report person related estimates. Various functions/statistics (mean, quantile, GLM etc.) are provided to handle this kind of data.
2296 Official Statistics & Survey Methodology synthpop Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>.
2297 Official Statistics & Survey Methodology tidycensus Load US Census Boundary and Attribute Data as ‘tidyverse’ and ‘sf’-Ready Data Frames An integrated R interface to the decennial US Census and American Community Survey APIs and the US Census Bureau’s geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for many geographies.
2298 Official Statistics & Survey Methodology tmap Thematic Maps Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.
2299 Official Statistics & Survey Methodology treemap Treemap Visualization A treemap is a space-filling visualization of hierarchical structures. This package offers great flexibility to draw treemaps.
2300 Official Statistics & Survey Methodology univOutl Detection of Univariate Outliers Well known outlier detection techniques in the univariate case. Methods to deal with skewed distribution are included too. The Hidiroglou-Berthelot (1986) method to search for outliers in ratios of historical data is implemented as well. When available, survey weights can be used in outliers detection.
2301 Official Statistics & Survey Methodology validate Data Validation Infrastructure Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity.
2302 Official Statistics & Survey Methodology validatetools Checking and Simplifying Validation Rule Sets Rule sets with validation rules may contain redundancies or contradictions. Functions for finding redundancies and problematic rules are provided, given a set a rules formulated with ‘validate’.
2303 Official Statistics & Survey Methodology vardpoor Variance Estimation for Sample Surveys by the Ultimate Cluster Method Generation of domain variables, linearization of several nonlinear population statistics (the ratio of two totals, weighted income percentile, relative median income ratio, at-risk-of-poverty rate, at-risk-of-poverty threshold, Gini coefficient, gender pay gap, the aggregate replacement ratio, the relative median income ratio, median income below at-risk-of-poverty gap, income quintile share ratio, relative median at-risk-of-poverty gap), computation of regression residuals in case of weight calibration, variance estimation of sample surveys by the ultimate cluster method (Hansen, Hurwitz and Madow,Theory, vol. I: Methods and Applications; vol. II: Theory. 1953, New York: John Wiley and Sons), variance estimation for longitudinal, cross-sectional measures and measures of change for single and multistage stage cluster sampling designs (Berger, Y. G., 2015, <doi:10.1111/rssa.12116>). Several other precision measures are derived - standard error, the coefficient of variation, the margin of error, confidence interval, design effect.
2304 Official Statistics & Survey Methodology VIM Visualization and Imputation of Missing Values New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
2305 Official Statistics & Survey Methodology weights Weighting and Weighted Statistics Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson’s correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting point estimates from interaction terms in regressions (and multiply imputed regressions). NOTE: Weighted partial correlation calculations pulled to address a bug.
2306 Official Statistics & Survey Methodology x12 Interface to ‘X12-ARIMA’/‘X13-ARIMA-SEATS’ and Structure for Batch Processing of Seasonal Adjustment The ‘X13-ARIMA-SEATS’ <https://www.census.gov/srd/www/x13as/> methodology and software is a widely used software and developed by the US Census Bureau. It can be accessed from ‘R’ with this package and ‘X13-ARIMA-SEATS’ binaries are provided by the ‘R’ package ‘x13binary’.
2307 Official Statistics & Survey Methodology x12GUI X12 - Graphical User Interface A graphical user interface for the x12 package
2308 Official Statistics & Survey Methodology XBRL Extraction of Business Financial Information from ‘XBRL’ Documents Functions to extract business financial information from an Extensible Business Reporting Language (‘XBRL’) instance file and the associated collection of files that defines its ‘Discoverable’ Taxonomy Set (‘DTS’).
2309 Official Statistics & Survey Methodology yaImpute Nearest Neighbor Observation Imputation and Evaluation Tools Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.
2310 Optimization and Mathematical Programming ABCoptim Implementation of Artificial Bee Colony (ABC) Optimization An implementation of Karaboga (2005) Artificial Bee Colony Optimization algorithm <http://mf.erciyes.edu.tr/abc/pub/tr06_2005.pdf>. This (working) version is a Work-in-progress, which is why it has been implemented using pure R code. This was developed upon the basic version programmed in C and distributed at the algorithm’s official website.
2311 Optimization and Mathematical Programming adagio Discrete and Global Optimization Routines The R package ‘adagio’ will provide methods and algorithms for discrete optimization, e.g. knapsack and subset sum procedures, derivative-free Nelder-Mead and Hooke-Jeeves minimization, and some (evolutionary) global optimization functions.
2312 Optimization and Mathematical Programming alabama (core) Constrained Nonlinear Optimization Augmented Lagrangian Adaptive Barrier Minimization Algorithm for optimizing smooth nonlinear objective functions with constraints. Linear or nonlinear equality and inequality constraints are allowed.
2313 Optimization and Mathematical Programming BB Solving and Optimizing Large-Scale Nonlinear Systems Barzilai-Borwein spectral methods for solving nonlinear system of equations, and for optimizing nonlinear objective functions subject to simple constraints. A tutorial style introduction to this package is available in a vignette on the CRAN download page or, when the package is loaded in an R session, with vignette(“BB”).
2314 Optimization and Mathematical Programming boot Bootstrap Functions (Originally by Angelo Canty for S) Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.
2315 Optimization and Mathematical Programming bvls The Stark-Parker algorithm for bounded-variable least squares An R interface to the Stark-Parker implementation of an algorithm for bounded-variable least squares
2316 Optimization and Mathematical Programming caRamel Automatic Calibration by Evolutionary Multi Objective Algorithm Multi-objective optimizer initially developed for the calibration of hydrological models. The algorithm is a hybrid of the MEAS algorithm (Efstratiadis and Koutsoyiannis (2005) <doi:10.13140/RG.2.2.32963.81446>) by using the directional search method based on the simplexes of the objective space and the epsilon-NGSA-II algorithm with the method of classification of the parameter vectors archiving management by epsilon-dominance (Reed and Devireddy <doi:10.1142/9789812567796_0004>).
2317 Optimization and Mathematical Programming cccp Cone Constrained Convex Problems Routines for solving convex optimization problems with cone constraints by means of interior-point methods. The implemented algorithms are partially ported from CVXOPT, a Python module for convex optimization (see <http://cvxopt.org> for more information).
2318 Optimization and Mathematical Programming cec2005benchmark Benchmark for the CEC 2005 Special Session on Real-Parameter Optimization This package is a wrapper for the C implementation of the 25 benchmark functions for the CEC 2005 Special Session on Real-Parameter Optimization. The original C code by Santosh Tiwari and related documentation are available at http://www.ntu.edu.sg/home/EPNSugan/index_files/CEC-05/CEC05.htm.
2319 Optimization and Mathematical Programming cec2013 Benchmark functions for the Special Session and Competition on Real-Parameter Single Objective Optimization at CEC-2013 This package provides R wrappers for the C implementation of 28 benchmark functions defined for the Special Session and Competition on Real-Parameter Single Objective Optimization at CEC-2013. The focus of this package is to provide an open-source and multi-platform implementation of the CEC2013 benchmark functions, in order to make easier for researchers to test the performance of new optimization algorithms in a reproducible way. The original C code (Windows only) was provided by Jane Jing Liang, while GNU/Linux comments were made by Janez Brest. This package was gently authorised for publication on CRAN by Ponnuthurai Nagaratnam Suganthan. The official documentation is available at http://www.ntu.edu.sg/home/EPNSugan/index_files/CEC2013/CEC2013.htm. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian).
2320 Optimization and Mathematical Programming CEoptim Cross-Entropy R Package for Optimization Optimization solver based on the Cross-Entropy method.
2321 Optimization and Mathematical Programming clpAPI R Interface to C API of COIN-OR Clp R Interface to C API of COIN-OR Clp, depends on COIN-OR Clp Version >= 1.12.0.
2322 Optimization and Mathematical Programming CLSOCP A smoothing Newton method SOCP solver This package provides and implementation of a one step smoothing newton method for the solution of second order cone programming problems, originally described by Xiaoni Chi and Sanyang Liu.
2323 Optimization and Mathematical Programming clue Cluster Ensembles CLUster Ensembles.
2324 Optimization and Mathematical Programming cmaes Covariance Matrix Adapting Evolutionary Strategy Single objective optimization using a CMA-ES.
2325 Optimization and Mathematical Programming cmaesr Covariance Matrix Adaptation Evolution Strategy Pure R implementation of the Covariance Matrix Adaptation - Evolution Strategy (CMA-ES) with optional restarts (IPOP-CMA-ES).
2326 Optimization and Mathematical Programming colf Constrained Optimization on Linear Function Performs least squares constrained optimization on a linear objective function. It contains a number of algorithms to choose from and offers a formula syntax similar to lm().
2327 Optimization and Mathematical Programming coneproj Primal or Dual Cone Projections with Routines for Constrained Regression Routines doing cone projection and quadratic programming, as well as doing estimation and inference for constrained parametric regression and shape-restricted regression problems. See Mary C. Meyer (2013)<doi:10.1080/03610918.2012.659820> for more details.
2328 Optimization and Mathematical Programming copulaedas Estimation of Distribution Algorithms Based on Copulas Provides a platform where EDAs (estimation of distribution algorithms) based on copulas can be implemented and studied. The package offers complete implementations of various EDAs based on copulas and vines, a group of well-known optimization problems, and utility functions to study the performance of the algorithms. Newly developed EDAs can be easily integrated into the package by extending an S4 class with generic functions for their main components.
2329 Optimization and Mathematical Programming cplexAPI R Interface to C API of IBM ILOG CPLEX This is the R Interface to the C API of IBM ILOG CPLEX. It necessarily depends on IBM ILOG CPLEX (>= 12.1).
2330 Optimization and Mathematical Programming crs Categorical Regression Splines Regression splines that handle a mix of continuous and categorical (discrete) data often encountered in applied settings. I would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC, <http://www.nserc-crsng.gc.ca>), the Social Sciences and Humanities Research Council of Canada (SSHRC, <http://www.sshrc-crsh.gc.ca>), and the Shared Hierarchical Academic Research Computing Network (SHARCNET, <https://www.sharcnet.ca>).
2331 Optimization and Mathematical Programming CVXR Disciplined Convex Optimization An object-oriented modeling language for disciplined convex programming (DCP). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution.
2332 Optimization and Mathematical Programming dclone Data Cloning and MCMC Tools for Maximum Likelihood Methods Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.
2333 Optimization and Mathematical Programming DEoptim (core) Global Optimization by Differential Evolution Implements the differential evolution algorithm for global optimization of a real-valued function of a real-valued parameter vector.
2334 Optimization and Mathematical Programming DEoptimR Differential Evolution Optimization in Pure R Differential Evolution (DE) stochastic algorithms for global optimization of problems with and without constraints. The aim is to curate a collection of its state-of-the-art variants that (1) do not sacrifice simplicity of design, (2) are essentially tuning-free, and (3) can be efficiently implemented directly in the R language. Currently, it only provides an implementation of the ‘jDE’ algorithm by Brest et al. (2006) <doi:10.1109/TEVC.2006.872133>.
2335 Optimization and Mathematical Programming desirability Function Optimization and Ranking via Desirability Functions S3 classes for multivariate optimization using the desirability function by Derringer and Suich (1980).
2336 Optimization and Mathematical Programming dfoptim (core) Derivative-Free Optimization Derivative-Free optimization algorithms. These algorithms do not require gradient information. More importantly, they can be used to solve non-smooth optimization problems.
2337 Optimization and Mathematical Programming Dykstra Quadratic Programming using Cyclic Projections Solves quadratic programming problems using Richard L. Dykstra’s cyclic projection algorithm. Routine allows for a combination of equality and inequality constraints. See Dykstra (1983) <doi:10.1080/01621459.1983.10477029> for details.
2338 Optimization and Mathematical Programming ECOSolveR Embedded Conic Solver in R R interface to the Embedded COnic Solver (ECOS), an efficient and robust C library for convex problems. Conic and equality constraints can be specified in addition to integer and boolean variable constraints for mixed-integer problems. This R interface is inspired by the python interface and has similar calling conventions.
2339 Optimization and Mathematical Programming ecr Evolutionary Computation in R Framework for building evolutionary algorithms for both single- and multi-objective continuous or discrete optimization problems. A set of predefined evolutionary building blocks and operators is included. Moreover, the user can easily set up custom objective functions, operators, building blocks and representations sticking to few conventions. The package allows both a black-box approach for standard tasks (plug-and-play style) and a much more flexible white-box approach where the evolutionary cycle is written by hand.
2340 Optimization and Mathematical Programming flacco Feature-Based Landscape Analysis of Continuous and Constrained Optimization Problems Contains tools and features, which can be used for an Exploratory Landscape Analysis (ELA) of single-objective continuous optimization problems. Those features are able to quantify rather complex properties, such as the global structure, separability, etc., of the optimization problems.
2341 Optimization and Mathematical Programming FLSSS Mining Rigs for Specialized Subset Sum, Multi-Subset Sum, Multidimensional Subset Sum, Multidimensional Knapsack, Generalized Assignment Problems Specialized solvers for combinatorial optimization problems in the Subset Sum family. These solvers differ from the mainstream in the options of (i) restricting subset size, (ii) bounding subset elements, (iii) mining real-value sets with predefined subset sum errors, and (iv) finding one or more subsets in limited time. A novel algorithm for mining the one-dimensional Subset Sum induced algorithms for the multi-Subset Sum and the multidimensional Subset Sum. The latter decomposes the problem in a novel approach, and the multi-threaded framework offers exact algorithms to the multidimensional Knapsack and the Generalized Assignment problems. Package updates include (a) renewed implementation of the multi-Subset Sum, multidimensional Knapsack and Generalized Assignment solvers; (b) availability of bounding solution space in the multidimensional Subset Sum; (c) fundamental data structure and architectural changes for enhanced cache locality and better chance of SIMD vectorization; (d) an option of mapping real-domain problems to the integer domain with user-controlled precision loss, and those integers are further zipped non-uniformly in 64-bit buffers. Arithmetic on compressed integers is done by bit-manipulation and the design has virtually zero speed lag relative to normal integers arithmetic. The consequent reduction in dimensionality may yield substantial acceleration. Compilation with g++ ‘-Ofast’ is recommended. See package vignette (<arXiv:1612.04484v3>) for details. Functions prefixed with ‘aux’ (auxiliary) are or will be implementations of existing foundational or cutting-edge algorithms for solving optimization problems of interest.
2342 Optimization and Mathematical Programming GA Genetic Algorithms Flexible general-purpose toolbox implementing genetic algorithms (GAs) for stochastic optimisation. Binary, real-valued, and permutation representations are available to optimize a fitness function, i.e. a function provided by users depending on their objective function. Several genetic operators are available and can be combined to explore the best settings for the current task. Furthermore, users can define new genetic operators and easily evaluate their performances. Local search using general-purpose optimisation algorithms can be applied stochastically to exploit interesting regions. GAs can be run sequentially or in parallel, using an explicit master-slave parallelisation or a coarse-grain islands approach.
2343 Optimization and Mathematical Programming genalg R Based Genetic Algorithm R based genetic algorithm for binary and floating point chromosomes.
2344 Optimization and Mathematical Programming GenSA Generalized Simulated Annealing Performs search for global minimum of a very complex non-linear objective function with a very large number of optima.
2345 Optimization and Mathematical Programming globalOptTests Objective functions for benchmarking the performance of global optimization algorithms This package makes available 50 objective functions for benchmarking the performance of global optimization algorithms
2346 Optimization and Mathematical Programming glpkAPI R Interface to C API of GLPK R Interface to C API of GLPK, depends on GLPK Version >= 4.42.
2347 Optimization and Mathematical Programming GrassmannOptim Grassmann Manifold Optimization Optimizing a function F(U), where U is a semi-orthogonal matrix and F is invariant under an orthogonal transformation of U
2348 Optimization and Mathematical Programming gsl Wrapper for the Gnu Scientific Library An R wrapper for some of the functionality of the Gnu Scientific Library.
2349 Optimization and Mathematical Programming hydroPSO Particle Swarm Optimisation, with Focus on Environmental Models State-of-the-art version of the Particle Swarm Optimisation (PSO) algorithm (SPSO-2011 and SPSO-2007 capable). hydroPSO can be used as a replacement of the ‘optim’ R function for (global) optimization of non-smooth and non-linear functions. However, the main focus of hydroPSO is the calibration of environmental and other real-world models that need to be executed from the system console. hydroPSO is model-independent, allowing the user to easily interface any computer simulation model with the calibration engine (PSO). hydroPSO communicates with the model through the model’s own input and output files, without requiring access to the model’s source code. Several PSO variants and controlling options are included to fine-tune the performance of the calibration engine to different calibration problems. An advanced sensitivity analysis function together with user-friendly plotting summaries facilitate the interpretation and assessment of the calibration results. hydroPSO is parallel-capable, to alleviate the computational burden of complex models with “long” execution time. Bugs reports/comments/questions are very welcomed (in English, Spanish or Italian). See Zambrano-Bigiarini and Rojas (2013) <doi:10.1016/j.envsoft.2013.01.004> for more details.
2350 Optimization and Mathematical Programming igraph Network Analysis and Visualization Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
2351 Optimization and Mathematical Programming irace Iterated Racing for Automatic Algorithm Configuration Iterated race is an extension of the Iterated F-race method for the automatic configuration of optimization algorithms, that is, (offline) tuning their parameters by finding the most appropriate settings given a set of instances of an optimization problem.
2352 Optimization and Mathematical Programming isotone Active Set and Generalized PAVA for Isotone Optimization Contains two main functions: one for solving general isotone regression problems using the pool-adjacent-violators algorithm (PAVA); another one provides a framework for active set methods for isotone optimization problems with arbitrary order restrictions. Various types of loss functions are prespecified.
2353 Optimization and Mathematical Programming kernlab Kernel-Based Machine Learning Lab Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods ‘kernlab’ includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
2354 Optimization and Mathematical Programming kofnGA A Genetic Algorithm for Fixed-Size Subset Selection Provides a function that uses a genetic algorithm to search for a subset of size k from the integers 1:n, such that a user-supplied objective function is minimized at that subset. The selection step is done by tournament selection based on ranks, and elitism may be used to retain a portion of the best solutions from one generation to the next. Population objective function values may optionally be evaluated in parallel.
2355 Optimization and Mathematical Programming lbfgs Limited-memory BFGS Optimization A wrapper built around the libLBFGS optimization library by Naoaki Okazaki. The lbfgs package implements both the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and the Orthant-Wise Quasi-Newton Limited-Memory (OWL-QN) optimization algorithms. The L-BFGS algorithm solves the problem of minimizing an objective, given its gradient, by iteratively computing approximations of the inverse Hessian matrix. The OWL-QN algorithm finds the optimum of an objective plus the L1-norm of the problem’s parameters. The package offers a fast and memory-efficient implementation of these optimization routines, which is particularly suited for high-dimensional problems.
2356 Optimization and Mathematical Programming lbfgsb3 Limited Memory BFGS Minimizer with Bounds on Parameters Interfacing to Nocedal et al. L-BFGS-B.3.0 (2011) limited memory BFGS minimizer with bounds on parameters.
2357 Optimization and Mathematical Programming lbfgsb3c Limited Memory BFGS Minimizer with Bounds on Parameters with optim() ‘C’ Interface Interfacing to Nocedal et al. L-BFGS-B.3.0 (2011 <doi:10.1145/2049662.2049669>) limited memory BFGS minimizer with bounds on parameters. This is a fork of ‘lbfgsb3’. This registers a ‘R’ compatible ‘C’ interface to L-BFGS-B.3.0 that uses the same function types and optimization as the optim() function (see writing ‘R’ extensions and source for details). Ths package also adds more stopping criterion as well as allows adjusting more tolerances.
2358 Optimization and Mathematical Programming limSolve Solving Linear Inverse Models Functions that (1) find the minimum/maximum of a linear or quadratic function: min or max (f(x)), where f(x) = ||Ax-b||^2 or f(x) = sum(a_i*x_i) subject to equality constraints Ex=f and/or inequality constraints Gx>=h, (2) sample an underdetermined- or overdetermined system Ex=f subject to Gx>=h, and if applicable Ax~=b, (3) solve a linear system Ax=B for the unknown x. It includes banded and tridiagonal linear systems. The package calls Fortran functions from ‘LINPACK’.
2359 Optimization and Mathematical Programming linprog Linear Programming / Optimization This package can be used to solve Linear Programming / Linear Optimization problems by using the simplex algorithm.
2360 Optimization and Mathematical Programming localsolver R API to LocalSolver The package converts R data onto input and data for LocalSolver, executes optimization and exposes optimization results as R data. LocalSolver (http://www.localsolver.com/) is an optimization engine developed by Innovation24 (http://www.innovation24.fr/). It is designed to solve large-scale mixed-variable non-convex optimization problems. The localsolver package is developed and maintained by WLOG Solutions (http://www.wlogsolutions.com/en/) in collaboration with Decision Support and Analysis Division at Warsaw School of Economics (http://www.sgh.waw.pl/en/).
2361 Optimization and Mathematical Programming LowRankQP Low Rank Quadratic Programming Solves quadratic programming problems where the Hessian is represented as the product of two matrices.
2362 Optimization and Mathematical Programming lpSolve Interface to ‘Lp_solve’ v. 5.5 to Solve Linear/Integer Programs Lp_solve is freely available (under LGPL 2) software for solving linear, integer and mixed integer programs. In this implementation we supply a “wrapper” function in C and some R functions that solve general linear/integer problems, assignment problems, and transportation problems. This version calls lp_solve version 5.5.
2363 Optimization and Mathematical Programming lpSolveAPI R Interface to ‘lp_solve’ Version 5.5.2.0 The lpSolveAPI package provides an R interface to ‘lp_solve’, a Mixed Integer Linear Programming (MILP) solver with support for pure linear, (mixed) integer/binary, semi-continuous and special ordered sets (SOS) models.
2364 Optimization and Mathematical Programming lsei Solving Least Squares or Quadratic Programming Problems under Equality/Inequality Constraints It contains functions that solve least squares linear regression problems under linear equality/inequality constraints. Functions for solving quadratic programming problems are also available, which transform such problems into least squares ones first. It is developed based on the ‘Fortran’ program of Lawson and Hanson (1974, 1995), which is public domain and available at <http://www.netlib.org/lawson-hanson>.
2365 Optimization and Mathematical Programming ManifoldOptim An R Interface to the ‘ROPTLIB’ Library for Riemannian Manifold Optimization An R interface to version 0.3 of the ‘ROPTLIB’ optimization library (see <http://www.math.fsu.edu/~whuang2> for more information). Optimize real- valued functions over manifolds such as Stiefel, Grassmann, and Symmetric Positive Definite matrices.
2366 Optimization and Mathematical Programming matchingMarkets Analysis of Stable Matchings Implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups as well as two-sided matching of students to schools. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem, the college admissions problem, and the house allocation problem.
2367 Optimization and Mathematical Programming matchingR Matching Algorithms in R and C++ Computes matching algorithms quickly using Rcpp. Implements the Gale-Shapley Algorithm to compute the stable matching for two-sided markets, such as the stable marriage problem and the college-admissions problem. Implements Irving’s Algorithm for the stable roommate problem. Implements the top trading cycle algorithm for the indivisible goods trading problem.
2368 Optimization and Mathematical Programming maxLik Maximum Likelihood Estimation and Related Tools Functions for Maximum Likelihood (ML) estimation and non-linear optimization, and related tools. It includes a unified way to call different optimizers, and classes and methods to handle the results from the ML viewpoint. It also includes a number of convenience tools for testing and developing your own models.
2369 Optimization and Mathematical Programming mcga Machine Coded Genetic Algorithms for Real-Valued Optimization Problems Machine coded genetic algorithm (MCGA) is a fast tool for real-valued optimization problems. It uses the byte representation of variables rather than real-values. It performs the classical crossover operations (uniform) on these byte representations. Mutation operator is also similar to classical mutation operator, which is to say, it changes a randomly selected byte value of a chromosome by +1 or -1 with probability 1/2. In MCGAs there is no need for encoding-decoding process and the classical operators are directly applicable on real-values. It is fast and can handle a wide range of a search space with high precision. Using a 256-unary alphabet is the main disadvantage of this algorithm but a moderate size population is convenient for many problems. Package also includes multi_mcga function for multi objective optimization problems. This function sorts the chromosomes using their ranks calculated from the non-dominated sorting algorithm.
2370 Optimization and Mathematical Programming mco Multiple Criteria Optimization Algorithms and Related Functions Functions for multiple criteria optimization using genetic algorithms and related test problems
2371 Optimization and Mathematical Programming metaheuristicOpt Metaheuristic for Optimization An implementation of metaheuristic algorithms for continuous optimization. Currently, the package contains the implementations of the following algorithms: particle swarm optimization (Kennedy and Eberhart, 1995), ant lion optimizer (Mirjalili, 2015 <doi:10.1016/j.advengsoft.2015.01.010>), grey wolf optimizer (Mirjalili et al., 2014 <doi:10.1016/j.advengsoft.2013.12.007>), dragonfly algorithm (Mirjalili, 2015 <doi:10.1007/s00521-015-1920-1>), firefly algorithm (Yang, 2009 <doi:10.1007/978-3-642-04944-6_14>), genetic algorithm (Holland, 1992, ISBN:978-0262581110), grasshopper optimisation algorithm (Saremi et al., 2017 <doi:10.1016/j.advengsoft.2017.01.004>), harmony search algorithm (Mahdavi et al., 2007 <doi:10.1016/j.amc.2006.11.033>), moth flame optimizer (Mirjalili, 2015 <doi:10.1016/j.knosys.2015.07.006>, sine cosine algorithm (Mirjalili, 2016 <doi:10.1016/j.knosys.2015.12.022>) and whale optimization algorithm (Mirjalili and Lewis, 2016 <doi:10.1016/j.advengsoft.2016.01.008>).
2372 Optimization and Mathematical Programming minpack.lm R Interface to the Levenberg-Marquardt Nonlinear Least-Squares Algorithm Found in MINPACK, Plus Support for Bounds The nls.lm function provides an R interface to lmder and lmdif from the MINPACK library, for solving nonlinear least-squares problems by a modification of the Levenberg-Marquardt algorithm, with support for lower and upper parameter bounds. The implementation can be used via nls-like calls using the nlsLM function.
2373 Optimization and Mathematical Programming minqa Derivative-free optimization algorithms by quadratic approximation Derivative-free optimization by quadratic approximation based on an interface to Fortran implementations by M. J. D. Powell.
2374 Optimization and Mathematical Programming mize Unconstrained Numerical Optimization Algorithms Optimization algorithms implemented in R, including conjugate gradient (CG), Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the limited memory BFGS (L-BFGS) methods. Most internal parameters can be set through the call interface. The solvers hold up quite well for higher-dimensional problems.
2375 Optimization and Mathematical Programming mknapsack Multiple Knapsack Problem Solver Package solves multiple knapsack optimisation problem. Given a set of items, each with volume and value, it will allocate them to knapsacks of a given size in a way that value of top N knapsacks is as large as possible.
2376 Optimization and Mathematical Programming mlrMBO Bayesian Optimization and Model-Based Optimization of Expensive Black-Box Functions Flexible and comprehensive R toolbox for model-based optimization (‘MBO’), also known as Bayesian optimization. It implements the Efficient Global Optimization Algorithm and is designed for both single- and multi- objective optimization with mixed continuous, categorical and conditional parameters. The machine learning toolbox ‘mlr’ provide dozens of regression learners to model the performance of the target algorithm with respect to the parameter settings. It provides many different infill criteria to guide the search process. Additional features include multi-point batch proposal, parallel execution as well as visualization and sophisticated logging mechanisms, which is especially useful for teaching and understanding of algorithm behavior. ‘mlrMBO’ is implemented in a modular fashion, such that single components can be easily replaced or adapted by the user for specific use cases.
2377 Optimization and Mathematical Programming n1qn1 Port of the ‘Scilab’ ‘n1qn1’ and ‘qnbd’ Modules for (Un)constrained BFGS Optimization Provides ‘Scilab’ ‘n1qn1’, or Quasi-Newton BFGS “qn” without constraints and ‘qnbd’ or Quasi-Newton BFGS with constraints. This takes more memory than traditional L-BFGS. The n1qn1 routine is useful since it allows prespecification of a Hessian. If the Hessian is near enough the truth in optimization it can speed up the optimization problem. Both algorithms are described in the ‘Scilab’ optimization documentation located at <http://www.scilab.org/content/download/250/1714/file/optimization_in_scilab.pdf>.
2378 Optimization and Mathematical Programming neldermead R Port of the ‘Scilab’ Neldermead Module Provides several direct search optimization algorithms based on the simplex method. The provided algorithms are direct search algorithms, i.e. algorithms which do not use the derivative of the cost function. They are based on the update of a simplex. The following algorithms are available: the fixed shape simplex method of Spendley, Hext and Himsworth (unconstrained optimization with a fixed shape simplex), the variable shape simplex method of Nelder and Mead (unconstrained optimization with a variable shape simplex made), and Box’s complex method (constrained optimization with a variable shape simplex).
2379 Optimization and Mathematical Programming nilde Nonnegative Integer Solutions of Linear Diophantine Equations with Applications Routines for enumerating all existing nonnegative integer solutions of a linear Diophantine equation. The package provides routines for solving 0-1, bounded and unbounded knapsack problems; 0-1, bounded and unbounded subset sum problems; and a problem of additive partitioning of natural numbers.
2380 Optimization and Mathematical Programming NlcOptim Solve Nonlinear Optimization with Nonlinear Constraints Optimization for nonlinear objective and constraint functions. Linear or nonlinear equality and inequality constraints are allowed. It accepts the input parameters as a constrained matrix.
2381 Optimization and Mathematical Programming nlmrt Functions for Nonlinear Least Squares Solutions Replacement for nls() tools for working with nonlinear least squares problems. The calling structure is similar to, but much simpler than, that of the nls() function. Moreover, where nls() specifically does NOT deal with small or zero residual problems, nlmrt is quite happy to solve them. It also attempts to be more robust in finding solutions, thereby avoiding ‘singular gradient’ messages that arise in the Gauss-Newton method within nls(). The Marquardt-Nash approach in nlmrt generally works more reliably to get a solution, though this may be one of a set of possibilities, and may also be statistically unsatisfactory. Added print and summary as of August 28, 2012.
2382 Optimization and Mathematical Programming nloptr R Interface to NLopt Solve optimization problems using an R interface to NLopt. NLopt is a free/open-source library for nonlinear optimization, providing a common interface for a number of different free optimization routines available online as well as original implementations of various other algorithms. See <http://ab-initio.mit.edu/wiki/index.php/NLopt_Introduction> for more information on the available algorithms. During installation of nloptr on Unix-based systems, the installer checks whether the NLopt library is installed on the system. If the NLopt library cannot be found, the code is compiled using the NLopt source included in the nloptr package.
2383 Optimization and Mathematical Programming nls2 Non-linear regression with brute force Adds brute force and multiple starting values to nls.
2384 Optimization and Mathematical Programming nlsr Functions for Nonlinear Least Squares Solutions Provides tools for working with nonlinear least squares problems. It is intended to eventually supersede the ‘nls()’ function in the R distribution. For example, ‘nls()’ specifically does NOT deal with small or zero residual problems as its Gauss-Newton method frequently stops with ‘singular gradient’ messages. ‘nlsr’ is based on the now-deprecated package ‘nlmrt’, and has refactored functions and R-language symbolic derivative features.
2385 Optimization and Mathematical Programming NMOF Numerical Methods and Optimization in Finance Functions, examples and data from the book “Numerical Methods and Optimization in Finance” by M. Gilli, D. Maringer and E. Schumann (2011), ISBN 978-0123756626. The package provides implementations of several optimisation heuristics, such as Differential Evolution, Genetic Algorithms and Threshold Accepting. There are also functions for the valuation of financial instruments, such as bonds and options, and functions that help with stochastic simulations.
2386 Optimization and Mathematical Programming nnls The Lawson-Hanson algorithm for non-negative least squares (NNLS) An R interface to the Lawson-Hanson implementation of an algorithm for non-negative least squares (NNLS). Also allows the combination of non-negative and non-positive constraints.
2387 Optimization and Mathematical Programming ompr Model and Solve Mixed Integer Linear Programs Model mixed integer linear programs in an algebraic way directly in R. The model is solver-independent and thus offers the possibility to solve a model with different solvers. It currently only supports linear constraints and objective functions. See the ‘ompr’ website <https://dirkschumacher.github.io/ompr> for more information, documentation and examples.
2388 Optimization and Mathematical Programming onls Orthogonal Nonlinear Least-Squares Regression Orthogonal Nonlinear Least-Squares Regression using Levenberg-Marquardt Minimization.
2389 Optimization and Mathematical Programming optimsimplex R Port of the ‘Scilab’ Optimsimplex Module Provides a building block for optimization algorithms based on a simplex. The ‘optimsimplex’ package may be used in the following optimization methods: the simplex method of Spendley et al. (1962) <doi:10.1080/00401706.1962.10490033>, the method of Nelder and Mead (1965) <doi:10.1093/comjnl/7.4.308>, Box’s algorithm for constrained optimization (1965) <doi:10.1093/comjnl/8.1.42>, the multi-dimensional search by Torczon (1989) <http://www.cs.wm.edu/~va/research/thesis.pdf>, etc…
2390 Optimization and Mathematical Programming optimx Expanded Replacement and Extension of the ‘optim’ Function Provides a replacement and extension of the optim() function to call to several function minimization codes in R in a single statement. These methods handle smooth, possibly box constrained functions of several or many parameters. Note that function ‘optimr()’ was prepared to simplify the incorporation of minimization codes going forward. Also implements some utility codes and some extra solvers, including safeguarded Newton methods. Many methods previously separate are now included here.
2391 Optimization and Mathematical Programming optmatch Functions for Optimal Matching Distance based bipartite matching using the RELAX-IV minimum cost flow solver, oriented to matching of treatment and control groups in observational studies. Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination.
2392 Optimization and Mathematical Programming osqp Quadratic Programming Solver using the ‘OSQP’ Library Provides bindings to the ‘OSQP’ solver. The ‘OSQP’ solver is a numerical optimization package or solving convex quadratic programs written in ‘C’ and based on the alternating direction method of multipliers, ‘ADMM’. B. Stellato, G. Banjac, P. Goulart, A. Bemporad, S. Boyd (2018) <arXiv:1711.08013>.
2393 Optimization and Mathematical Programming parma Portfolio Allocation and Risk Management Applications Provision of a set of models and methods for use in the allocation and management of capital in financial portfolios.
2394 Optimization and Mathematical Programming pso Particle Swarm Optimization The package provides an implementation of PSO consistent with the standard PSO 2007/2011 by Maurice Clerc et al. Additionally a number of ancillary routines are provided for easy testing and graphics.
2395 Optimization and Mathematical Programming psoptim Particle Swarm Optimization Particle swarm optimization - a basic variant.
2396 Optimization and Mathematical Programming qap Heuristics for the Quadratic Assignment Problem (QAP) Implements heuristics for the Quadratic Assignment Problem (QAP). Currently only a simulated annealing heuristic is available.
2397 Optimization and Mathematical Programming quadprog (core) Functions to Solve Quadratic Programming Problems This package contains routines and documentation for solving quadratic programming problems.
2398 Optimization and Mathematical Programming quadprogXT Quadratic Programming with Absolute Value Constraints Extends the quadprog package to solve quadratic programs with absolute value constraints and absolute values in the objective function.
2399 Optimization and Mathematical Programming quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
2400 Optimization and Mathematical Programming rBayesianOptimization Bayesian Optimization of Hyperparameters A Pure R implementation of Bayesian Global Optimization with Gaussian Processes.
2401 Optimization and Mathematical Programming rcdd Computational Geometry R interface to (some of) cddlib (<http://www.ifor.math.ethz.ch/~fukuda/cdd_home/cdd.html>). Converts back and forth between two representations of a convex polytope: as solution of a set of linear equalities and inequalities and as convex hull of set of points and rays. Also does linear programming and redundant generator elimination (for example, convex hull in n dimensions). All functions can use exact infinite-precision rational arithmetic.
2402 Optimization and Mathematical Programming RCEIM R Cross Entropy Inspired Method for Optimization An implementation of a stochastic heuristic method for performing multidimensional function optimization. The method is inspired in the Cross-Entropy Method. It does not relies on derivatives, neither imposes particularly strong requirements into the function to be optimized. Additionally, it takes profit from multi-core processing to enable optimization of time-consuming functions.
2403 Optimization and Mathematical Programming Rcgmin Conjugate Gradient Minimization of Nonlinear Functions Conjugate gradient minimization of nonlinear functions with box constraints incorporating the Dai/Yuan update. This implementation should be used in place of the “CG” algorithm of the optim() function.
2404 Optimization and Mathematical Programming rCMA R-to-Java Interface for ‘CMA-ES’ Tool for providing access to the Java version ‘CMAEvolutionStrategy’ of Nikolaus Hansen. ‘CMA-ES’ is the Covariance Matrix Adaptation Evolution Strategy, see https://www.lri.fr/~hansen/cmaes_inmatlab.html#java.
2405 Optimization and Mathematical Programming Rcplex R Interface to CPLEX R interface to CPLEX solvers for linear, quadratic, and (linear and quadratic) mixed integer programs. Support for quadratically constrained programming is available. See the file “INSTALL” for details on how to install the Rcplex package in Linux/Unix-like and Windows systems. Support for sparse matrices is provided by an S3-style class “simple_triplet_matrix” from package slam and by objects from the Matrix package class hierarchy.
2406 Optimization and Mathematical Programming RcppDE Global Optimization by Differential Evolution in C++ An efficient C++ based implementation of the ‘DEoptim’ function which performs global optimization by differential evolution. Its creation was motivated by trying to see if the old approximation “easier, shorter, faster: pick any two” could in fact be extended to achieving all three goals while moving the code from plain old C to modern C++. The initial version did in fact do so, but a good part of the gain was due to an implicit code review which eliminated a few inefficiencies which have since been eliminated in ‘DEoptim’.
2407 Optimization and Mathematical Programming RcppNumerical ‘Rcpp’ Integration for Numerical Computing Libraries A collection of open source libraries for numerical computing (numerical integration, optimization, etc.) and their integration with ‘Rcpp’.
2408 Optimization and Mathematical Programming Rcsdp R Interface to the CSDP Semidefinite Programming Library R interface to the CSDP semidefinite programming library. Installs version 6.1.1 of CSDP from the COIN-OR website if required. An existing installation of CSDP may be used by passing the proper configure arguments to the installation command. See the INSTALL file for further details.
2409 Optimization and Mathematical Programming Rdsdp R Interface to DSDP Semidefinite Programming Library R interface to DSDP semidefinite programming library. The DSDP software is a free open source implementation of an interior-point method for semidefinite programming. It provides primal and dual solutions, exploits low-rank structure and sparsity in the data, and has relatively low memory requirements for an interior-point method.
2410 Optimization and Mathematical Programming rgenoud R Version of GENetic Optimization Using Derivatives A genetic algorithm plus derivative optimizer.
2411 Optimization and Mathematical Programming Rglpk R/GNU Linear Programming Kit Interface R interface to the GNU Linear Programming Kit. ‘GLPK’ is open source software for solving large-scale linear programming (LP), mixed integer linear programming (‘MILP’) and other related problems.
2412 Optimization and Mathematical Programming rLindo R Interface to LINDO API An interface to LINDO API. Supports Linear, Integer, Quadratic, Conic, General Nonlinear, Global, and Stochastic Programming models. To download the trial version LINDO API, please visit www.lindo.com/rlindo.
2413 Optimization and Mathematical Programming Rmalschains Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R An implementation of an algorithm family for continuous optimization called memetic algorithms with local search chains (MA-LS-Chains). Memetic algorithms are hybridizations of genetic algorithms with local search methods. They are especially suited for continuous optimization.
2414 Optimization and Mathematical Programming Rmosek The R to MOSEK Optimization Interface This is a generic meta-package designed to make the optimization facilities of MOSEK available from the R-language. The interface supports large-scale optimization of many kinds: Mixed-integer and continuous linear, second-order cone, exponential cone and power cone optimization, as well as continuous semidefinite optimization. Rmosek and the R-language are open-source projects. MOSEK is a proprietary product, but unrestricted trial and academic licenses are available.
2415 Optimization and Mathematical Programming rneos XML-RPC Interface to NEOS Within this package the XML-RPC API to NEOS <https://neos-server.org/neos/> is implemented. This enables the user to pass optimization problems to NEOS and retrieve results within R.
2416 Optimization and Mathematical Programming ROI R Optimization Infrastructure The R Optimization Infrastructure (‘ROI’) is a sophisticated framework for handling optimization problems in R. More information can be found on the ‘ROI’ homepage <http://roi.r-forge.r-project.org/>.
2417 Optimization and Mathematical Programming ROI.plugin.clp ‘Clp (Coin-or linear programming)’ Plugin for the ‘R’ Optimization Interface Enhances the R Optimization Infrastructure (ROI) package by registering the COIN-OR Clp open-source solver from the COIN-OR suite <https://projects.coin-or.org/>. It allows for solving linear programming with continuous objective variables keeping sparse constraints definition.
2418 Optimization and Mathematical Programming ROI.plugin.neos ‘NEOS’ Plug-in for the ‘R’ Optimization Interface Enhances the ‘R’ Optimization Infrastructure (‘ROI’) package with a connection to the ‘neos’ server. ‘ROI’ optimization problems can be directly be sent to the ‘neos’ server and solution obtained in the typical ‘ROI’ style.
2419 Optimization and Mathematical Programming ROI.plugin.qpoases ‘qpOASES’ Plugin for the ‘R’ Optimization Infrastructure Enhances the ‘R’ Optimization Infrastructure (‘ROI’) package with the quadratic solver ‘qpOASES’. More information about ‘qpOASES’ can be found at <https://projects.coin-or.org/qpOASES/>.
2420 Optimization and Mathematical Programming Rsolnp General Non-Linear Optimization General Non-linear Optimization Using Augmented Lagrange Multiplier Method.
2421 Optimization and Mathematical Programming Rsymphony SYMPHONY in R An R interface to the SYMPHONY solver for mixed-integer linear programs.
2422 Optimization and Mathematical Programming Rtnmin Truncated Newton Function Minimization with Bounds Constraints Truncated Newton function minimization with bounds constraints based on the ‘Matlab’/‘Octave’ codes of Stephen Nash.
2423 Optimization and Mathematical Programming Rvmmin Variable Metric Nonlinear Function Minimization Variable metric nonlinear function minimization with bounds constraints.
2424 Optimization and Mathematical Programming SACOBRA Self-Adjusting COBRA Performs constrained optimization for expensive black-box problems.
2425 Optimization and Mathematical Programming scs Splitting Conic Solver Solves convex cone programs via operator splitting. Can solve: linear programs (LPs), second-order cone programs (SOCPs), semidefinite programs (SDPs), exponential cone programs (ECPs), and power cone programs (PCPs), or problems with any combination of those cones. SCS uses AMD (a set of routines for permuting sparse matrices prior to factorization) and LDL (a sparse LDL’ factorization and solve package) from ‘SuiteSparse’ (<http://www.suitesparse.com>).
2426 Optimization and Mathematical Programming sdpt3r Semi-Definite Quadratic Linear Programming Solver Solves the general Semi-Definite Linear Programming formulation using an R implementation of SDPT3 (K.C. Toh, M.J. Todd, and R.H. Tutuncu (1999) <doi:10.1080/10556789908805762>). This includes problems such as the nearest correlation matrix problem (Higham (2002) <doi:10.1093/imanum/22.3.329>), D-optimal experimental design (Smith (1918) <doi:10.2307/2331929>), Distance Weighted Discrimination (Marron and Todd (2012) <doi:10.1198/016214507000001120>), as well as graph theory problems including the maximum cut problem. Technical details surrounding SDPT3 can be found in R.H Tutuncu, K.C. Toh, and M.J. Todd (2003) <doi:10.1007/s10107-002-0347-5>.
2427 Optimization and Mathematical Programming smoof Single and Multi-Objective Optimization Test Functions Provides generators for a high number of both single- and multi- objective test functions which are frequently used for the benchmarking of (numerical) optimization algorithms. Moreover, it offers a set of convenient functions to generate, plot and work with objective functions.
2428 Optimization and Mathematical Programming sna Tools for Social Network Analysis A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.
2429 Optimization and Mathematical Programming soma General-Purpose Optimisation With the Self-Organising Migrating Algorithm This package provides an R implementation of the Self-Organising Migrating Algorithm, a general-purpose, stochastic optimisation algorithm. The approach is similar to that of genetic algorithms, although it is based on the idea of a series of “migrations” by a fixed set of individuals, rather than the development of successive generations. It can be applied to any cost-minimisation problem with a bounded parameter space, and is robust to local minima.
2430 Optimization and Mathematical Programming subplex Unconstrained Optimization using the Subplex Algorithm The subplex algorithm for unconstrained optimization, developed by Tom Rowan <http://www.netlib.org/opt/subplex.tgz>.
2431 Optimization and Mathematical Programming tabuSearch Tabu Search Algorithm for Binary Configurations Tabu search algorithm for binary configurations. A basic version of the algorithm as described by Fouskakis and Draper (2007) <doi:10.1111/j.1751-5823.2002.tb00174.x>.
2432 Optimization and Mathematical Programming trust Trust Region Optimization Does local optimization using two derivatives and trust regions. Guaranteed to converge to local minimum of objective function.
2433 Optimization and Mathematical Programming trustOptim Trust Region Optimization for Nonlinear Functions with Sparse Hessians Trust region algorithm for nonlinear optimization. Efficient when the Hessian of the objective function is sparse (i.e., relatively few nonzero cross-partial derivatives). See Braun, M. (2014) <doi:10.18637/jss.v060.i04>.
2434 Optimization and Mathematical Programming TSP Traveling Salesperson Problem (TSP) Basic infrastructure and some algorithms for the traveling salesperson problem (also traveling salesman problem; TSP). The package provides some simple algorithms and an interface to the Concorde TSP solver and its implementation of the Chained-Lin-Kernighan heuristic. The code for Concorde itself is not included in the package and has to be obtained separately.
2435 Optimization and Mathematical Programming ucminf (core) General-Purpose Unconstrained Non-Linear Optimization An algorithm for general-purpose unconstrained non-linear optimization. The algorithm is of quasi-Newton type with BFGS updating of the inverse Hessian and soft line search with a trust region type monitoring of the input to the line search algorithm. The interface of ‘ucminf’ is designed for easy interchange with ‘optim’.
2436 Analysis of Pharmacokinetic Data cpk Clinical Pharmacokinetics The package cpk provides simplified clinical pharmacokinetic functions for dose regimen design and modification at the point-of-care. Currently, the following functions are available: (1) ttc.fn for target therapeutic concentration, (2) dr.fn for dose rate, (3) di.fn for dosing interval, (4) dm.fn for maintenance dose, (5) bc.ttc.fn for back calculation, (6) ar.fn for accumulation ratio, (7) dpo.fn for orally administered dose, (8) cmax.fn for peak concentration, (9) css.fn for steady-state concentration, (10) cmin.fn for trough,(11) ct.fn for concentration-time predictions, (12) dlcmax.fn for calculating loading dose based on drug’s maximum concentration, (13) dlar.fn for calculating loading dose based on drug’s accumulation ratio, and (14) R0.fn for calculating drug infusion rate. Reference: Linares O, Linares A. Computational opioid prescribing: A novel application of clinical pharmacokinetics. J Pain Palliat Care Pharmacother 2011;25:125-135.
2437 Analysis of Pharmacokinetic Data dfpk Bayesian Dose-Finding Designs using Pharmacokinetics (PK) for Phase I Clinical Trials Statistical methods involving PK measures are provided, in the dose allocation process during a Phase I clinical trials. These methods, proposed by Ursino et al, (2017) <doi:10.1002/bimj.201600084>, enter pharmacokinetics (PK) in the dose finding designs in different ways, including covariates models, dependent variable or hierarchical models. This package provides functions to generate data from several scenarios and functions to run simulations which their objective is to determine the maximum tolerated dose (MTD).
2438 Analysis of Pharmacokinetic Data mrgsolve Simulate from ODE-Based Models Fast simulation from ordinary differential equation (ODE) based models typically employed in quantitative pharmacology and systems biology.
2439 Analysis of Pharmacokinetic Data ncar Noncompartmental Analysis for Pharmacokinetic Report Conduct a noncompartmental analysis as closely as possible to the most widely used commercial software for pharmacokinetic analysis, i.e. ‘Phoenix(R) WinNonlin(R)’ <https://www.certara.com/software/pkpd-modeling-and-simulation/phoenix-winnonlin/>. Some features are 1) CDISC SDTM terms 2) Automatic slope selection with the same criterion of WinNonlin(R) 3) Supporting both ‘linear-up linear-down’ and ‘linear-up log-down’ method 4) Interval(partial) AUCs with ‘linear’ or ‘log’ interpolation method 5) Produce pdf, rtf, text report files. * Reference: Gabrielsson J, Weiner D. Pharmacokinetic and Pharmacodynamic Data Analysis - Concepts and Applications. 5th ed. 2016. (ISBN:9198299107).
2440 Analysis of Pharmacokinetic Data nmw Understanding Nonlinear Mixed Effects Modeling for Population Pharmacokinetics This shows how NONMEM(R) <http://www.iconplc.com/innovation/nonmem/> software works. NONMEM’s classical estimation methods like ‘First Order(FO) approximation’, ‘First Order Conditional Estimation(FOCE)’, and ‘Laplacian approximation’ are explained.
2441 Analysis of Pharmacokinetic Data NonCompart Noncompartmental Analysis for Pharmacokinetic Data Conduct a noncompartmental analysis as closely as possible to the most widely used commercial software for pharmacokinetic analysis, i.e. ‘Phoenix(R) WinNonlin(R)’ <https://www.certara.com/software/pkpd-modeling-and-simulation/phoenix-winnonlin/>. Some features are 1) Use of CDISC SDTM terms 2) Automatic slope selection with the same criterion of WinNonlin(R) 3) Supporting both ‘linear-up linear-down’ and ‘linear-up log-down’ method 4) Interval(partial) AUCs with ‘linear’ or ‘log’ interpolation method * Reference: Gabrielsson J, Weiner D. Pharmacokinetic and Pharmacodynamic Data Analysis - Concepts and Applications. 5th ed. 2016. (ISBN:9198299107).
2442 Analysis of Pharmacokinetic Data PK Basic Non-Compartmental Pharmacokinetics Estimation of pharmacokinetic parameters using non-compartmental theory.
2443 Analysis of Pharmacokinetic Data PKgraph Model diagnostics for population pharmacokinetic models PKgraph provides a graphical user interface for population pharmacokinetic model diagnosis. It also provides an integrated and comprehensive platform for the analysis of pharmacokinetic data including exploratory data analysis, goodness of model fit, model validation and model comparison. Results from a variety of modeling fitting software, including NONMEM, Monolix, SAS and R, can be used. PKgraph is programmed in R, and uses the R packages lattice, ggplot2 for static graphics, and rggobi for interactive graphics.
2444 Analysis of Pharmacokinetic Data PKPDmodels Pharmacokinetic/pharmacodynamic models Provides functions to evaluate common pharmacokinetic/pharmacodynamic models and their gradients.
2445 Analysis of Pharmacokinetic Data pkr Pharmacokinetics in R Conduct a noncompartmental analysis as closely as possible to the most widely used commercial software for pharmacokinetic analysis, i.e. ‘Phoenix(R) WinNonlin(R)’ <https://www.certara.com/software/pkpd-modeling-and-simulation/phoenix-winnonlin/>. Some features are 1) CDISC SDTM terms 2) Automatic slope selection with the same criterion of WinNonlin(R) 3) Supporting both ‘linear-up linear-down’ and ‘linear-up log-down’ method 4) Interval(partial) AUCs with ‘linear’ or ‘log’ interpolation method * Reference: Gabrielsson J, Weiner D. Pharmacokinetic and Pharmacodynamic Data Analysis - Concepts and Applications. 5th ed. 2016. (ISBN:9198299107).
2446 Analysis of Pharmacokinetic Data PKreport A reporting pipeline for checking population pharmacokinetic model assumption PKreport aims to 1) provide automatic pipeline for users to visualize data and models. It creates a flexible R framework with automatically generated R scripts to save time and cost for later usage; 2) implement an archive-oriented management tool for users to store, retrieve and modify figures. 3) offer powerful and convenient service to generate high-quality graphs based on two R packages: lattice and ggplot2.
2447 Analysis of Pharmacokinetic Data scaRabee Optimization Toolkit for Pharmacokinetic-Pharmacodynamic Models scaRabee is a port of the Scarabee toolkit originally written as a Matlab-based application. It provides a framework for simulation and optimization of pharmacokinetic-pharmacodynamic models at the individual and population level. It is built on top of the neldermead package, which provides the direct search algorithm proposed by Nelder and Mead for model optimization.
2448 Phylogenetics, Especially Comparative Methods adephylo Exploratory Analyses for the Phylogenetic Comparative Method Multivariate tools to analyze comparative data, i.e. a phylogeny and some traits measured for each taxa.
2449 Phylogenetics, Especially Comparative Methods adhoc Calculate Ad Hoc Distance Thresholds for DNA Barcoding Identification Two functions to calculate intra- and interspecific pairwise distances, evaluate DNA barcoding identification error and calculate an ad hoc distance threshold for each particular reference library of DNA barcodes. Specimen identification at this ad hoc distance threshold (using the best close match method) will produce identifications with an estimated relative error probability that can be fixed by the user (e.g. 5%).
2450 Phylogenetics, Especially Comparative Methods adiv Analysis of Diversity Includes functions, data sets and examples for the calculation of various indices of biodiversity including species, functional and phylogenetic diversity. Part of the indices are expressed in terms of equivalent numbers of species. It also provides ways to partition biodiversity across spatial or temporal scales (alpha, beta, gamma diversities). In addition to the quantification of biodiversity, ordination approaches are available which rely on diversity indices and allow the detailed identification of species, functional or phylogenetic differences between communities.
2451 Phylogenetics, Especially Comparative Methods ape (core) Analyses of Phylogenetics and Evolution Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel’s test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
2452 Phylogenetics, Especially Comparative Methods apex Phylogenetic Methods for Multiple Gene Data Toolkit for the analysis of multiple gene data. Apex implements the new S4 classes ‘multidna’, ‘multiphyDat’ and associated methods to handle aligned DNA sequences from multiple genes.
2453 Phylogenetics, Especially Comparative Methods aphid Analysis with Profile Hidden Markov Models Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification. Based on the models and algorithms described in Durbin et al (1998, ISBN: 9780521629713).
2454 Phylogenetics, Especially Comparative Methods apTreeshape Analyses of Phylogenetic Treeshape Simulation and analysis of phylogenetic tree topologies using statistical indices. It is a companion library of the ‘ape’ package. It provides additional functions for reading, plotting, manipulating phylogenetic trees. It also offers convenient web-access to public databases, and enables testing null models of macroevolution using corrected test statistics. Trees of class “phylo” (from ‘ape’ package) can be converted easily. Implements methods described in Bortolussi et al. (2005) <doi:10.1093/bioinformatics/bti798> and Maliet et al. (2017) <doi:10.1101/224295>.
2455 Phylogenetics, Especially Comparative Methods BAMMtools Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees Provides functions for analyzing and visualizing complex macroevolutionary dynamics on phylogenetic trees. It is a companion package to the command line program BAMM (Bayesian Analysis of Macroevolutionary Mixtures) and is entirely oriented towards the analysis, interpretation, and visualization of evolutionary rates. Functionality includes visualization of rate shifts on phylogenies, estimating evolutionary rates through time, comparing posterior distributions of evolutionary rates across clades, comparing diversification models using Bayes factors, and more.
2456 Phylogenetics, Especially Comparative Methods bayou Bayesian Fitting of Ornstein-Uhlenbeck Models to Phylogenies Tools for fitting and simulating multi-optima Ornstein-Uhlenbeck models to phylogenetic comparative data using Bayesian reversible-jump methods.
2457 Phylogenetics, Especially Comparative Methods betapart Partitioning Beta Diversity into Turnover and Nestedness Components Functions to compute pair-wise dissimilarities (distance matrices) and multiple-site dissimilarities, separating the turnover and nestedness-resultant components of taxonomic (incidence and abundance based), functional and phylogenetic beta diversity.
2458 Phylogenetics, Especially Comparative Methods BoSSA A Bunch of Structure and Sequence Analysis Reads and plots phylogenetic placements.
2459 Phylogenetics, Especially Comparative Methods brms Bayesian Regression Models using ‘Stan’ Fit Bayesian generalized (non-)linear multivariate multilevel models using ‘Stan’ for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit among others linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation. References: Burkner (2017) <doi:10.18637/jss.v080.i01>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.
2460 Phylogenetics, Especially Comparative Methods brranching Fetch ‘Phylogenies’ from Many Sources Includes methods for fetching ‘phylogenies’ from a variety of sources, including the ‘Phylomatic’ web service (<http://phylodiversity.net/phylomatic>), and ‘Phylocom’ (<https://github.com/phylocom/phylocom/>).
2461 Phylogenetics, Especially Comparative Methods caper Comparative Analyses of Phylogenetics and Evolution in R Functions for performing phylogenetic comparative analyses.
2462 Phylogenetics, Especially Comparative Methods convevol Analysis of Convergent Evolution Quantifies and assesses the significance of convergent evolution using two different methods (and 5 different measures) as described in Stayton (2015) <doi:10.1111/evo.12729>. Also displays results in a phylomorphospace framework.
2463 Phylogenetics, Especially Comparative Methods corHMM Analysis of Binary Character Evolution Fits a hidden rates model that allows different transition rate classes on different portions of a phylogeny by treating rate classes as hidden states in a Markov process and various other functions for evaluating models of binary character evolution.
2464 Phylogenetics, Especially Comparative Methods DAMOCLES Dynamic Assembly Model of Colonization, Local Extinction and Speciation Simulates and computes (maximum) likelihood of a dynamical model of community assembly that takes into account phylogenetic history.
2465 Phylogenetics, Especially Comparative Methods DDD Diversity-Dependent Diversification Implements maximum likelihood and bootstrap methods based on the diversity-dependent birth-death process to test whether speciation or extinction are diversity-dependent, under various models including various types of key innovations. See Etienne et al. 2012, Proc. Roy. Soc. B 279: 1300-1309, <doi:10.1098/rspb.2011.1439>, Etienne & Haegeman 2012, Am. Nat. 180: E75-E89, <doi:10.1086/667574> and Etienne et al. 2016. Meth. Ecol. Evol. 7: 1092-1099, <doi:10.1111/2041-210X.12565>. Also contains functions to simulate the diversity-dependent process.
2466 Phylogenetics, Especially Comparative Methods dendextend Extending ‘dendrogram’ Functionality in R Offers a set of functions for extending ‘dendrogram’ objects in R, letting you visualize and compare trees of ‘hierarchical clusterings’. You can (1) Adjust a tree’s graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different ‘dendrograms’ to one another.
2467 Phylogenetics, Especially Comparative Methods dispRity Measuring Disparity A modular package for measuring disparity from multidimensional matrices. Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several basic statistical tests for disparity analysis.
2468 Phylogenetics, Especially Comparative Methods distory Distance Between Phylogenetic Histories Geodesic distance between phylogenetic trees and associated functions.
2469 Phylogenetics, Especially Comparative Methods diversitree Comparative ‘Phylogenetic’ Analyses of Diversification Contains a number of comparative ‘phylogenetic’ methods, mostly focusing on analysing diversification and character evolution. Contains implementations of ‘BiSSE’ (Binary State ‘Speciation’ and Extinction) and its unresolved tree extensions, ‘MuSSE’ (Multiple State ‘Speciation’ and Extinction), ‘QuaSSE’, ‘GeoSSE’, and ‘BiSSE-ness’ Other included methods include Markov models of discrete and continuous trait evolution and constant rate ‘speciation’ and extinction.
2470 Phylogenetics, Especially Comparative Methods ecospat Spatial Ecology Miscellaneous Methods Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.
2471 Phylogenetics, Especially Comparative Methods entropart Entropy Partitioning to Measure Diversity Measurement and partitioning of diversity, based on Tsallis entropy, following Marcon and Herault (2015) <doi:10.18637/jss.v067.i08>. ‘entropart’ provides functions to calculate alpha, beta and gamma diversity of communities, including phylogenetic and functional diversity. Estimation-bias corrections are available.
2472 Phylogenetics, Especially Comparative Methods enveomics.R Various Utilities for Microbial Genomics and Metagenomics A collection of functions for microbial ecology and other applications of genomics and metagenomics. Companion package for the Enveomics Collection (Rodriguez-R, L.M. and Konstantinidis, K.T., 2016 <doi:10.7287/peerj.preprints.1900v1>).
2473 Phylogenetics, Especially Comparative Methods evobiR Comparative and Population Genetic Analyses Comparative analysis of continuous traits influencing discrete states, and utility tools to facilitate comparative analyses. Implementations of ABBA/BABA type statistics to test for introgression in genomic data. Wright-Fisher, phylogenetic tree, and statistical distribution Shiny interactive simulations for use in teaching.
2474 Phylogenetics, Especially Comparative Methods geiger Analysis of Evolutionary Diversification Methods for fitting macroevolutionary models to phylogenetic trees.
2475 Phylogenetics, Especially Comparative Methods geomorph Geometric Morphometric Analyses of 2D/3D Landmark Data Read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.
2476 Phylogenetics, Especially Comparative Methods ggmuller Create Muller Plots of Evolutionary Dynamics Create plots that combine a phylogeny and frequency dynamics. Phylogenetic input can be a generic adjacency matrix or a tree of class “phylo”. Inspired by similar plots in publications of the labs of RE Lenski and JE Barrick. Named for HJ Muller (who popularised such plots) and H Wickham (whose code this package exploits).
2477 Phylogenetics, Especially Comparative Methods ggplot2 Create Elegant Data Visualisations Using the Grammar of Graphics A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
2478 Phylogenetics, Especially Comparative Methods GUniFrac Generalized UniFrac Distances Generalized UniFrac distances for comparing microbial communities. Permutational multivariate analysis of variance using multiple distance matrices.
2479 Phylogenetics, Especially Comparative Methods HMPTrees Statistical Object Oriented Data Analysis of RDP-Based Taxonomic Trees from Human Microbiome Data Tools to model, compare, and visualize populations of taxonomic tree objects.
2480 Phylogenetics, Especially Comparative Methods HyPhy Macroevolutionary phylogentic analysis of species trees and gene trees A Bay Area high level phylogenetic analysis package mostly using the birth-death process. Analysis of species tree branching times and simulation of species trees under a number of different time variable birth-death processes. Analysis of gene tree species tree reconciliations and simulations of gene trees in species trees.
2481 Phylogenetics, Especially Comparative Methods idendr0 Interactive Dendrograms Interactive dendrogram that enables the user to select and color clusters, to zoom and pan the dendrogram, and to visualize the clustered data not only in a built-in heat map, but also in ‘GGobi’ interactive plots and user-supplied plots. This is a backport of Qt-based ‘idendro’ (<https://github.com/tsieger/idendro>) to base R graphics and Tcl/Tk GUI.
2482 Phylogenetics, Especially Comparative Methods ips Interfaces to Phylogenetic Software in R This package provides functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.
2483 Phylogenetics, Especially Comparative Methods iteRates Parametric rate comparison Iterates through a phylogenetic tree to identify regions of rate variation using the parametric rate comparison test.
2484 Phylogenetics, Especially Comparative Methods jaatha Simulation-Based Maximum Likelihood Parameter Estimation An estimation method that can use computer simulations to approximate maximum-likelihood estimates even when the likelihood function can not be evaluated directly. It can be applied whenever it is feasible to conduct many simulations, but works best when the data is approximately Poisson distributed. It was originally designed for demographic inference in evolutionary biology. It has optional support for conducting coalescent simulation using the ‘coala’ package.
2485 Phylogenetics, Especially Comparative Methods kdetrees Nonparametric method for identifying discordant phylogenetic trees A non-parametric method for identifying potential outlying observations in a collection of phylogenetic trees based on the methods of Owen and Provan (2011). Such discordant trees may indicate problems with sequence annotation or tree reconstruction, or they may represent interesting biological phenomena, such as horizontal gene transfers.
2486 Phylogenetics, Especially Comparative Methods markophylo Markov Chain Models for Phylogenetic Trees Allows for fitting of maximum likelihood models using Markov chains on phylogenetic trees for analysis of discrete character data. Examples of such discrete character data include restriction sites, gene family presence/absence, intron presence/absence, and gene family size data. Hypothesis-driven user- specified substitution rate matrices can be estimated. Allows for biologically realistic models combining constrained substitution rate matrices, site rate variation, site partitioning, branch-specific rates, allowing for non-stationary prior root probabilities, correcting for sampling bias, etc. See Dang and Golding (2016) <doi:10.1093/bioinformatics/btv541> for more details.
2487 Phylogenetics, Especially Comparative Methods MCMCglmm MCMC Generalised Linear Mixed Models MCMC Generalised Linear Mixed Models.
2488 Phylogenetics, Especially Comparative Methods metacoder Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data A set of tools for parsing, manipulating, and graphing data classified by a hierarchy (e.g. a taxonomy).
2489 Phylogenetics, Especially Comparative Methods metafor Meta-Analysis Package for R A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L’Abbe, Baujat, GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto’s method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted.
2490 Phylogenetics, Especially Comparative Methods MPSEM Modeling Phylogenetic Signals using Eigenvector Maps Computational tools to represent phylogenetic signals using adapted eigenvector maps.
2491 Phylogenetics, Especially Comparative Methods mvMORPH Multivariate Comparative Tools for Fitting Evolutionary Models to Morphometric Data Fits multivariate (Brownian Motion, Early Burst, ACDC, Ornstein-Uhlenbeck and Shifts) models of continuous traits evolution on trees and time series. ‘mvMORPH’ also proposes high-dimensional multivariate comparative tools (linear models using Generalized Least Squares) based on penalized likelihood. See Clavel et al. (2015) <doi:10.1111/2041-210X.12420> and Clavel et al. (2018) <doi:10.1093/sysbio/syy045>.
2492 Phylogenetics, Especially Comparative Methods nodiv Compares the Distribution of Sister Clades Through a Phylogeny An implementation of the nodiv algorithm, see Borregaard, M.K., Rahbek, C., Fjeldsaa, J., Parra, J.L., Whittaker, R.J. & Graham, C.H. 2014. Node-based analysis of species distributions. Methods in Ecology and Evolution 5(11): 1225-1235. <doi:10.1111/2041-210X.12283>. Package for phylogenetic analysis of species distributions. The main function goes through each node in the phylogeny, compares the distributions of the two descendant nodes, and compares the result to a null model. This highlights nodes where major distributional divergence have occurred. The distributional divergence for these nodes is mapped using the SOS statistic.
2493 Phylogenetics, Especially Comparative Methods ouch Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.
2494 Phylogenetics, Especially Comparative Methods outbreaker Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data Bayesian reconstruction of disease outbreaks using epidemiological and genetic information.
2495 Phylogenetics, Especially Comparative Methods OUwie Analysis of Evolutionary Rates in an OU Framework Calculates and compares rate differences of continuous character evolution under Brownian motion and a new set of Ornstein-Uhlenbeck-based Hansen models that allow the strength of selection and stochastic motion to vary across selective regimes.
2496 Phylogenetics, Especially Comparative Methods paleotree Paleontological and Phylogenetic Analyses of Evolution Provides tools for transforming, a posteriori time-scaling, and modifying phylogenies containing extinct (i.e. fossil) lineages. In particular, most users are interested in the functions timePaleoPhy(), bin_timePaleoPhy(), cal3TimePaleoPhy() and bin_cal3TimePaleoPhy(), which date cladograms of fossil taxa using stratigraphic data. This package also contains a large number of likelihood functions for estimating sampling and diversification rates from different types of data available from the fossil record (e.g. range data, occurrence data, etc). paleotree users can also simulate diversification and sampling in the fossil record using the function simFossilRecord(), which is a detailed simulator for branching birth-death-sampling processes composed of discrete taxonomic units arranged in ancestor-descendant relationships. Users can use simFossilRecord() to simulate diversification in incompletely sampled fossil records, under various models of morphological differentiation (i.e. the various patterns by which morphotaxa originate from one another), and with time-dependent, longevity-dependent and/or diversity-dependent rates of diversification, extinction and sampling. Additional functions allow users to translate simulated ancestor-descendant data from simFossilRecord() into standard time-scaled phylogenies or unscaled cladograms that reflect the relationships among taxon units.
2497 Phylogenetics, Especially Comparative Methods paleoTS Analyze Paleontological Time-Series Facilitates analysis of paleontological sequences of trait values. Functions are provided to fit, using maximum likelihood, simple evolutionary models (including unbiased random walks, directional evolution,stasis, Ornstein-Uhlenbeck, covariate-tracking) and complex models (punctuation, mode shifts).
2498 Phylogenetics, Especially Comparative Methods pastis Phylogenetic Assembly with Soft Taxonomic Inferences A pre-processor for mrBayes that assimilates sequences, taxonomic information and tree constraints as per xxx. The main functions of interest for most users will be pastis_simple, pastis_main and conch. The main analysis is conducted with pastis_simple or pastis_main followed by a manual execution of mrBayes (>3.2). The placement of taxa not contained in the tree constraint can be investigated using conch.
2499 Phylogenetics, Especially Comparative Methods PBD Protracted Birth-Death Model of Diversification Conducts maximum likelihood analysis and simulation of the protracted birth-death model of diversification. See Etienne, R.S. & J. Rosindell 2012 <doi:10.1093/sysbio/syr091>; Lambert, A., H. Morlon & R.S. Etienne 2014, <doi:10.1007/s00285-014-0767-x>; Etienne, R.S., H. Morlon & A. Lambert 2014, <doi:10.1111/evo.12433>.
2500 Phylogenetics, Especially Comparative Methods PCPS Principal Coordinates of Phylogenetic Structure Set of functions for analysis of Principal Coordinates of Phylogenetic Structure (PCPS).
2501 Phylogenetics, Especially Comparative Methods pegas Population and Evolutionary Genetics Analysis System Functions for reading, writing, plotting, analysing, and manipulating allelic and haplotypic data, including from VCF files, and for the analysis of population nucleotide sequences and micro-satellites including coalescent analyses, linkage disequilibrium, population structure (Fst, Amova) and equilibrium (HWE), haplotype networks, minimum spanning tree and network, and median-joining networks.
2502 Phylogenetics, Especially Comparative Methods phangorn Phylogenetic Reconstruction and Analysis Package contains methods for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation. Allows to compare trees, models selection and offers visualizations for trees and split networks.
2503 Phylogenetics, Especially Comparative Methods phyclust Phylogenetic Clustering (Phyloclustering) Phylogenetic clustering (phyloclustering) is an evolutionary Continuous Time Markov Chain model-based approach to identify population structure from molecular data without assuming linkage equilibrium. The package phyclust (Chen 2011) provides a convenient implementation of phyloclustering for DNA and SNP data, capable of clustering individuals into subpopulations and identifying molecular sequences representative of those subpopulations. It is designed in C for performance, interfaced with R for visualization, and incorporates other popular open source programs including ms (Hudson 2002) <doi:10.1093/bioinformatics/18.2.337>, seq-gen (Rambaut and Grassly 1997) <doi:10.1093/bioinformatics/13.3.235>, Hap-Clustering (Tzeng 2005) <doi:10.1002/gepi.20063> and PAML baseml (Yang 1997, 2007) <doi:10.1093/bioinformatics/13.5.555>, <doi:10.1093/molbev/msm088>, for simulating data, additional analyses, and searching the best tree. See the phyclust website for more information, documentations and examples.
2504 Phylogenetics, Especially Comparative Methods phyext2 An Extension (for Package ‘SigTree’) of Some of the Classes in Package ‘phylobase’ Based on (but not identical to) the no-longer-maintained package ‘phyext’, provides enhancements to ‘phylobase’ classes, specifically for use by package ‘SigTree’; provides classes and methods which help users manipulate branch-annotated trees (as in ‘SigTree’); also provides support for a few other extra features.
2505 Phylogenetics, Especially Comparative Methods phylobase Base Package for Phylogenetic Structures and Comparative Data Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.
2506 Phylogenetics, Especially Comparative Methods phylocanvas Interactive Phylogenetic Trees Using the ‘Phylocanvas’ JavaScript Library Create and customize interactive phylogenetic trees using the ‘phylocanvas’ JavaScript library and the ‘htmlwidgets’ package. These trees can be used directly from the R console, from ‘RStudio’, in Shiny apps, and in R Markdown documents. See <http://phylocanvas.org/> for more information on the ‘phylocanvas’ library.
2507 Phylogenetics, Especially Comparative Methods phyloclim Integrating Phylogenetics and Climatic Niche Modeling Implements some methods in phyloclimatic modeling: estimation of ancestral climatic niches, age-range-correlation, niche equivalency test and background-similarity test.
2508 Phylogenetics, Especially Comparative Methods PHYLOGR Functions for Phylogenetically Based Statistical Analyses Manipulation and analysis of phylogenetically simulated data sets and phylogenetically based analyses using GLS.
2509 Phylogenetics, Especially Comparative Methods phylogram Dendrograms for Evolutionary Analysis Contains functions for developing phylogenetic trees as deeply-nested lists (“dendrogram” objects). Enables bi-directional conversion between dendrogram and “phylo” objects (see Paradis et al (2004) <doi:10.1093/bioinformatics/btg412>), and features several tools for command-line tree manipulation and import/export via Newick parenthetic text.
2510 Phylogenetics, Especially Comparative Methods phyloland Modelling Competitive Exclusion and Limited Dispersal in a Statistical Phylogeographic Framework Phyloland package models a space colonization process mapped onto a phylogeny, it aims at estimating limited dispersal and ecological competitive exclusion in a Bayesian MCMC statistical phylogeographic framework (please refer to phyloland-package help for details.)
2511 Phylogenetics, Especially Comparative Methods phylolm Phylogenetic Linear Regression Provides functions for fitting phylogenetic linear models and phylogenetic generalized linear models. The computation uses an algorithm that is linear in the number of tips in the tree. The package also provides functions for simulating continuous or binary traits along the tree. Other tools include functions to test the adequacy of a population tree.
2512 Phylogenetics, Especially Comparative Methods phylotools Phylogenetic Tools for Eco-Phylogenetics A collection of tools for building RAxML supermatrix using PHYLIP or aligned FASTA files. These functions will be useful for building large phylogenies using multiple markers.
2513 Phylogenetics, Especially Comparative Methods phyloTop Calculating Topological Properties of Phylogenies Tools for calculating and viewing topological properties of phylogenetic trees.
2514 Phylogenetics, Especially Comparative Methods phyreg The Phylogenetic Regression of Grafen (1989) Provides general linear model facilities (single y-variable, multiple x-variables with arbitrary mixture of continuous and categorical and arbitrary interactions) for cross-species data. The method is, however, based on the nowadays rather uncommon situation in which uncertainty about a phylogeny is well represented by adopting a single polytomous tree. The theory is in A. Grafen (1989, Proc. R. Soc. B 326, 119-157) and aims to cope with both recognised phylogeny (closely related species tend to be similar) and unrecognised phylogeny (a polytomy usually indicates ignorance about the true sequence of binary splits).
2515 Phylogenetics, Especially Comparative Methods phytools Phylogenetic Tools for Comparative Biology (and Other Things) A wide range of functions for phylogenetic analysis. Functionality is concentrated in phylogenetic comparative biology, but also includes a diverse array of methods for visualizing, manipulating, reading or writing, and even inferring phylogenetic trees and data. Included among the functions in phylogenetic comparative biology are various for ancestral state reconstruction, model-fitting, simulation of phylogenies and data, and multivariate analysis. There are a broad range of plotting methods for phylogenies and comparative data which include, but are not restricted to, methods for mapping trait evolution on trees, for projecting trees into phenotypic space or a geographic map, and for visualizing correlated speciation between trees. Finally, there are a number of functions for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data not covered by other packages. For instance, there are functions for randomly or non-randomly attaching species or clades to a phylogeny, for estimating supertrees or consensus phylogenies from a set, for simulating trees and phylogenetic data under a range of models, and for a wide variety of other manipulations and analyses that phylogenetic biologists might find useful in their research.
2516 Phylogenetics, Especially Comparative Methods picante Integrating Phylogenies and Ecology Functions for phylocom integration, community analyses, null-models, traits and evolution. Implements numerous ecophylogenetic approaches including measures of community phylogenetic and trait diversity, phylogenetic signal, estimation of trait values for unobserved taxa, null models for community and phylogeny randomizations, and utility functions for data input/output and phylogeny plotting. A full description of package functionality and methods are provided by Kembel et al. (2010) <doi:10.1093/bioinformatics/btq166>.
2517 Phylogenetics, Especially Comparative Methods pmc Phylogenetic Monte Carlo Monte Carlo based model choice for applied phylogenetics of continuous traits. Method described in Carl Boettiger, Graham Coop, Peter Ralph (2012) Is your phylogeny informative? Measuring the power of comparative methods, Evolution 66 (7) 2240-51. doi:10.1111/j.1558-5646.2011.01574.x.
2518 Phylogenetics, Especially Comparative Methods ratematrix Bayesian Estimation of the Evolutionary Rate Matrix Estimates the evolutionary rate matrix (R) using Markov chain Monte Carlo (MCMC) as described in Caetano and Harmon (2017) <doi:10.1111/2041-210X.12826>. The package has functions to run MCMC chains, plot results, evaluate convergence, and summarize posterior distributions.
2519 Phylogenetics, Especially Comparative Methods rdryad Access for Dryad Web Services Interface to the Dryad “Solr” API, their “OAI-PMH” service, and fetch datasets. Dryad (<http://datadryad.org/>) is a curated host of data underlying scientific publications.
2520 Phylogenetics, Especially Comparative Methods rmetasim An Individual-Based Population Genetic Simulation Environment An interface between R and the metasim simulation engine. The simulation environment is documented in: “Strand, A.(2002) <doi:10.1046/j.1471-8286.2002.00208.x> Metasim 1.0: an individual-based environment for simulating population genetics of complex population dynamics. Mol. Ecol. Notes. Please see the vignettes CreatingLandscapes and Simulating to get some ideas on how to use the packages. See the rmetasim vignette to get an overview and to see important changes to the code in the most recent version.
2521 Phylogenetics, Especially Comparative Methods rncl An Interface to the Nexus Class Library An interface to the Nexus Class Library which allows parsing of NEXUS, Newick and other phylogenetic tree file formats. It provides elements of the file that can be used to build phylogenetic objects such as ape’s ‘phylo’ or phylobase’s ‘phylo4(d)’. This functionality is demonstrated with ‘read_newick_phylo()’ and ‘read_nexus_phylo()’.
2522 Phylogenetics, Especially Comparative Methods RNeXML Semantically Rich I/O for the ‘NeXML’ Format Provides access to phyloinformatic data in ‘NeXML’ format. The package should add new functionality to R such as the possibility to manipulate ‘NeXML’ objects in more various and refined way and compatibility with ‘ape’ objects.
2523 Phylogenetics, Especially Comparative Methods rotl Interface to the ‘Open Tree of Life’ API An interface to the ‘Open Tree of Life’ API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to ‘Open Tree identifiers’. The ‘Open Tree of Life’ aims at assembling a comprehensive phylogenetic tree for all named species.
2524 Phylogenetics, Especially Comparative Methods rphast Interface to ‘PHAST’ Software for Comparative Genomics Provides an R interface to the ‘PHAST’(<http://compgen.cshl.edu/phast/>) software (Phylogenetic Analysis with Space/Time Models). It can be used for many types of analysis in comparative and evolutionary genomics, such as estimating models of evolution from sequence data, scoring alignments for conservation or acceleration, and predicting elements based on conservation or custom phylogenetic hidden Markov models. It can also perform many basic operations on multiple sequence alignments and phylogenetic trees.
2525 Phylogenetics, Especially Comparative Methods Rphylip An R interface for PHYLIP Rphylip provides an R interface for the PHYLIP package. All users of Rphylip will thus first have to install the PHYLIP phylogeny methods program package (Felsenstein 2013). See http://www.phylip.com for more information about installing PHYLIP.
2526 Phylogenetics, Especially Comparative Methods SigTree Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree Provides tools to identify and visualize branches in a phylogenetic tree that are significantly responsive to some intervention, taking as primary inputs a phylogenetic tree (of class phylo) and a data frame (or matrix) of corresponding tip (OTU) labels and p-values.
2527 Phylogenetics, Especially Comparative Methods strap Stratigraphic Tree Analysis for Palaeontology Functions for the stratigraphic analysis of phylogenetic trees.
2528 Phylogenetics, Especially Comparative Methods surface Fitting Hansen Models to Investigate Convergent Evolution SURFACE is a data-driven phylogenetic comparative method for fitting stabilizing selection models to continuous trait data, building on the ouch package. The main functions fit a series of Hansen models using stepwise AIC, then identify cases of convergent evolution where multiple lineages have shifted to the same adaptive peak.
2529 Phylogenetics, Especially Comparative Methods SYNCSA Analysis of Functional and Phylogenetic Patterns in Metacommunities Analysis of metacommunities based on functional traits and phylogeny of the community components. The functions that are offered here implement for the R environment methods that have been available in the SYNCSA application written in C++ (by Valerio Pillar, available at <http:// ecoqua.ecologia.ufrgs.br/ecoqua/SYNCSA.html>).
2530 Phylogenetics, Especially Comparative Methods taxize Taxonomic Information from Around the Web Interacts with a suite of web ‘APIs’ for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more.
2531 Phylogenetics, Especially Comparative Methods TESS Diversification Rate Estimation and Fast Simulation of Reconstructed Phylogenetic Trees under Tree-Wide Time-Heterogeneous Birth-Death Processes Including Mass-Extinction Events Simulation of reconstructed phylogenetic trees under tree-wide time-heterogeneous birth-death processes and estimation of diversification parameters under the same model. Speciation and extinction rates can be any function of time and mass-extinction events at specific times can be provided. Trees can be simulated either conditioned on the number of species, the time of the process, or both. Additionally, the likelihood equations are implemented for convenience and can be used for Maximum Likelihood (ML) estimation and Bayesian inference.
2532 Phylogenetics, Especially Comparative Methods tidytree A Tidy Tool for Phylogenetic Tree Data Manipulation Phylogenetic tree generally contains multiple components including node, edge, branch and associated data. ‘tidytree’ provides an approach to convert tree object to tidy data frame as well as provides tidy interfaces to manipulate tree data.
2533 Phylogenetics, Especially Comparative Methods treebase Discovery, Access and Manipulation of ‘TreeBASE’ Phylogenies Interface to the API for ‘TreeBASE’ <http://treebase.org> from ‘R.’ ‘TreeBASE’ is a repository of user-submitted phylogenetic trees (of species, population, or genes) and the data used to create them.
2534 Phylogenetics, Especially Comparative Methods treedater Fast Molecular Clock Dating of Phylogenetic Trees with Rate Variation Functions for estimating times of common ancestry and molecular clock rates of evolution using a variety of evolutionary models, parametric and nonparametric bootstrap confidence intervals, methods for detecting outlier lineages, root-to-tip regression, and a statistical test for selecting molecular clock models. The methods are described in Volz, E.M. and S.D.W. Frost (2017) <doi:10.1093/ve/vex025>.
2535 Phylogenetics, Especially Comparative Methods TreePar Estimating birth and death rates based on phylogenies
  1. For a given species phylogeny on present day data which is calibrated to calendar-time, a method for estimating maximum likelihood speciation and extinction processes is provided. The method allows for non-constant rates. Rates may change (1) as a function of time, i.e. rate shifts at specified times or mass extinction events (likelihood implemented as LikShifts, optimization as bd.shifts.optim and visualized as bd.shifts.plot) or (2) as a function of the number of species, i.e. density-dependence (likelihood implemented as LikDD and optimization as bd.densdep.optim) or (3) extinction rate may be a function of species age (likelihood implemented as LikAge and optimization as bd.age.optim.matlab). Note that the methods take into account the whole phylogeny, in particular it accounts for the “pull of the present” effect. (1-3) can take into account incomplete species sampling, as long as each species has the same probability of being sampled. For a given phylogeny on higher taxa (i.e. all but one species per taxa are missing), where the number of species is known within each higher taxa, speciation and extinction rates can be estimated under model (1) (implemented within LikShifts and bd.shifts.optim with groups !=0). (ii) For a given phylogeny with sequentially sampled tips, e.g. a virus phylogeny, rates can be estimated under a model where rates vary across time using bdsky.stt.optim based on likelihood LikShiftsSTT (extending LikShifts and bd.shifts.optim). Furthermore, rates may vary as a function of host types using LikTypesSTT (multitype branching process extending functions in R package diversitree). This function can furthermore calculate the likelihood under an epidemiological model where infected individuals are first exposed and then infectious.
2536 Phylogenetics, Especially Comparative Methods TreeSim Simulating Phylogenetic Trees Simulation methods for phylogenetic trees where (i) all tips are sampled at one time point or (ii) tips are sampled sequentially through time. (i) For sampling at one time point, simulations are performed under a constant rate birth-death process, conditioned on having a fixed number of final tips (sim.bd.taxa()), or a fixed age (sim.bd.age()), or a fixed age and number of tips (sim.bd.taxa.age()). When conditioning on the number of final tips, the method allows for shifts in rates and mass extinction events during the birth-death process (sim.rateshift.taxa()). The function sim.bd.age() (and sim.rateshift.taxa() without extinction) allow the speciation rate to change in a density-dependent way. The LTT plots of the simulations can be displayed using LTT.plot(), LTT.plot.gen() and LTT.average.root(). TreeSim further samples trees with n final tips from a set of trees generated by the common sampling algorithm stopping when a fixed number m>>n of tips is first reached (sim.gsa.taxa()). This latter method is appropriate for m-tip trees generated under a big class of models (details in the sim.gsa.taxa() man page). For incomplete phylogeny, the missing speciation events can be added through simulations (corsim()). (ii) sim.rateshifts.taxa() is generalized to sim.bdsky.stt() for serially sampled trees, where the trees are conditioned on either the number of sampled tips or the age. Furthermore, for a multitype-branching process with sequential sampling, trees on a fixed number of tips can be simulated using sim.bdtypes.stt.taxa(). This function further allows to simulate under epidemiological models with an exposed class. The function sim.genespeciestree() simulates coalescent gene trees within birth-death species trees, and sim.genetree() simulates coalescent gene trees.
2537 Phylogenetics, Especially Comparative Methods vegan Community Ecology Package Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
2538 Phylogenetics, Especially Comparative Methods warbleR Streamline Bioacoustic Analysis A tool to streamline the analysis of animal acoustic signal structure. The package offers functions for downloading avian vocalizations from the open-access online repository Xeno-Canto <https://www.xeno-canto.org>, manipulating sound files, detecting acoustic signals, assessing performance of methods that measure acoustic similarity, conducting cross-correlations, dynamic time warping, measuring acoustic parameters and analysing interactive vocal signals, among others. Most functions working iteratively allow parallelization to improve computational efficiency.
2539 Phylogenetics, Especially Comparative Methods windex windex: Analysing convergent evolution using the Wheatsheaf index Analysing convergent evolution using the Wheatsheaf index.
2540 Psychometric Models and Methods ade4 (core) Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
2541 Psychometric Models and Methods anacor (core) Simple and Canonical Correspondence Analysis Performs simple and canonical CA (covariates on rows/columns) on a two-way frequency table (with missings) by means of SVD. Different scaling methods (standard, centroid, Benzecri, Goodman) as well as various plots including confidence ellipsoids are provided.
2542 Psychometric Models and Methods AnalyzeFMRI Functions for Analysis of fMRI Datasets Stored in the ANALYZE or NIFTI Format Functions for I/O, visualisation and analysis of functional Magnetic Resonance Imaging (fMRI) datasets stored in the ANALYZE or NIFTI format. Note that the latest version of XQuartz seems to be necessary under MacOS.
2543 Psychometric Models and Methods aspect A General Framework for Multivariate Analysis with Optimal Scaling Contains various functions for optimal scaling. One function performs optimal scaling by maximizing an aspect (i.e. a target function such as the sum of eigenvalues, sum of squared correlations, squared multiple correlations, etc.) of the corresponding correlation matrix. Another function performs implements the LINEALS approach for optimal scaling by minimization of an aspect based on pairwise correlations and correlation ratios. The resulting correlation matrix and category scores can be used for further multivariate methods such as structural equation models.
2544 Psychometric Models and Methods asymmetry Multidimensional Scaling of Asymmetric Data Multidimensional scaling models and methods for the visualization for asymmetric data <doi:10.1111/j.2044-8317.1996.tb01078.x>. An asymmetric matrix has the same number of rows and columns, and these rows and columns refer to the same set of objects. At least some elements in the upper-triangle are different from the corresponding elements in the lower triangle. An example is a student migration table, where the rows correspond to the countries of origin of the students and the columns to the destination countries. This package provides the slide-vector model <doi:10.1007/BF02294474>, a scaling model with unique dimensions and the asymscal model for asymmetric multidimensional scaling. Furthermore, a heat map for skew-symmetric data, and the decomposition of asymmetry are provided for the analysis of asymmetric tables.
2545 Psychometric Models and Methods BayesFM Bayesian Inference for Factor Modeling Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: Bayesian Exploratory Factor Analysis (befa), an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling. More approaches will be included in future releases of this package.
2546 Psychometric Models and Methods BayesLCA Bayesian Latent Class Analysis Bayesian Latent Class Analysis using several different methods.
2547 Psychometric Models and Methods betareg Beta Regression Beta regression for modeling beta-distributed dependent variables, e.g., rates and proportions. In addition to maximum likelihood regression (for both mean and precision of a beta-distributed response), bias-corrected and bias-reduced estimation as well as finite mixture models and recursive partitioning for beta regressions are provided.
2548 Psychometric Models and Methods BigSEM Constructing Large Systems of Structural Equations Construct large systems of structural equations using the two-stage penalized least squares (2SPLS) method proposed by Chen, Zhang and Zhang (2016).
2549 Psychometric Models and Methods birtr The R Package for “The Basics of Item Response Theory Using R” R functions for “The Basics of Item Response Theory Using R” by Frank B. Baker and Seock-Ho Kim (Springer, 2017, ISBN-13: 978-3-319-54204-1) including iccplot(), icccal(), icc(), iccfit(), groupinv(), tcc(), ability(), tif(), and rasch(). For example, iccplot() plots an item characteristic curve under the two-parameter logistic model.
2550 Psychometric Models and Methods blavaan (core) Bayesian Latent Variable Analysis Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models.
2551 Psychometric Models and Methods bpca Biplot of Multivariate Data Based on Principal Components Analysis Implements biplot (2d and 3d) of multivariate data based on principal components analysis and diagnostic tools of the quality of the reduction.
2552 Psychometric Models and Methods BradleyTerry2 Bradley-Terry Models Specify and fit the Bradley-Terry model, including structured versions in which the parameters are related to explanatory variables through a linear predictor and versions with contest-specific effects, such as a home advantage.
2553 Psychometric Models and Methods BTLLasso Modelling Heterogeneity in Paired Comparison Data Performs ‘BTLLasso’ as described by Schauberger and Tutz (2019) <doi:10.18637/jss.v088.i09> and Schauberger and Tutz (2017) <doi:10.1177/1471082X17693086>. BTLLasso is a method to include different types of variables in paired comparison models and, therefore, to allow for heterogeneity between subjects. Variables can be subject-specific, object-specific and subject-object-specific and can have an influence on the attractiveness/strength of the objects. Suitable L1 penalty terms are used to cluster certain effects and to reduce the complexity of the models.
2554 Psychometric Models and Methods ca (core) Simple, Multiple and Joint Correspondence Analysis Computation and visualization of simple, multiple and joint correspondence analysis.
2555 Psychometric Models and Methods cabootcrs Bootstrap Confidence Regions for Correspondence Analysis Performs correspondence analysis on a two-way contingency table and produces bootstrap-based elliptical confidence regions around the projected coordinates for the category points. Includes routines to plot the results in a variety of styles. Also reports the standard numerical output for correspondence analysis.
2556 Psychometric Models and Methods cacIRT Classification Accuracy and Consistency under Item Response Theory Computes classification accuracy and consistency indices under Item Response Theory. Implements the total score IRT-based methods in Lee, Hanson & Brennen (2002) and Lee (2010), the IRT-based methods in Rudner (2001, 2005), and the total score nonparametric methods in Lathrop & Cheng (2014). For dichotomous and polytomous tests.
2557 Psychometric Models and Methods catR Generation of IRT Response Patterns under Computerized Adaptive Testing Provides routines for the generation of response patterns under unidimensional dichotomous and polytomous computerized adaptive testing (CAT) framework. It holds many standard functions to estimate ability, select the first item(s) to administer and optimally select the next item, as well as several stopping rules. Options to control for item exposure and content balancing are also available (Magis and Barrada (2017) <doi:10.18637/jss.v076.c01>).
2558 Psychometric Models and Methods CAvariants Correspondence Analysis Variants Provides six variants of two-way correspondence analysis (ca): simple ca, singly ordered ca, doubly ordered ca, non symmetrical ca, singly ordered non symmetrical ca, and doubly ordered non symmetrical ca.
2559 Psychometric Models and Methods CDM Cognitive Diagnosis Modeling Functions for cognitive diagnosis modeling and multidimensional item response modeling for dichotomous and polytomous item responses. This package enables the estimation of the DINA and DINO model (Junker & Sijtsma, 2001, <doi:10.1177/01466210122032064>), the multiple group (polytomous) GDINA model (de la Torre, 2011, <doi:10.1007/s11336-011-9207-7>), the multiple choice DINA model (de la Torre, 2009, <doi:10.1177/0146621608320523>), the general diagnostic model (GDM; von Davier, 2008, <doi:10.1348/000711007X193957>), the structured latent class model (SLCA; Formann, 1992, <doi:10.1080/01621459.1992.10475229>) and regularized latent class analysis (Chen, Li, Liu, & Ying, 2017, <doi:10.1007/s11336-016-9545-6>). See George, Robitzsch, Kiefer, Gross, and Uenlue (2017) <doi:10.18637/jss.v074.i02> for further details on estimation and the package structure. For tutorials on how to use the CDM package see George and Robitzsch (2015, <doi:10.20982/tqmp.11.3.p189>) as well as Ravand and Robitzsch (2015).
2560 Psychometric Models and Methods cds Constrained Dual Scaling for Detecting Response Styles This is an implementation of constrained dual scaling for detecting response styles in categorical data, including utility functions. The procedure involves adding additional columns to the data matrix representing the boundaries between the rating categories. The resulting matrix is then doubled and analyzed by dual scaling. One-dimensional solutions are sought which provide optimal scores for the rating categories. These optimal scores are constrained to follow monotone quadratic splines. Clusters are introduced within which the response styles can vary. The type of response style present in a cluster can be diagnosed from the optimal scores for said cluster, and this can be used to construct an imputed version of the data set which adjusts for response styles.
2561 Psychometric Models and Methods ClustVarLV Clustering of Variables Around Latent Variables Functions for the clustering of variables around Latent Variables. Each cluster of variables, which may be defined as a local or directional cluster, is associated with a latent variable. External variables measured on the same observations or/and additional information on the variables can be taken into account. A “noise” cluster or sparse latent variables can also be defined.
2562 Psychometric Models and Methods CMC Cronbach-Mesbah Curve Calculation and plot of the stepwise Cronbach-Mesbah Curve
2563 Psychometric Models and Methods cncaGUI Canonical Non-Symmetrical Correspondence Analysis in R A GUI with which users can construct and interact with Canonical Correspondence Analysis and Canonical Non-Symmetrical Correspondence Analysis and provides inferential results by using Bootstrap Methods.
2564 Psychometric Models and Methods cNORM Continuous Norming Conventional methods for producing standard scores in psychometrics or biometrics are often plagued with “jumps” or “gaps” (i.e., discontinuities) in norm tables and low confidence for assessing extreme scores. The continuous norming method introduced by A. Lenhard et al. (2016), <doi:10.1177/1073191116656437>, generates continuous test norm scores on the basis of the raw data from standardization samples, without requiring assumptions about the distribution of the raw data: Norm scores are directly established from raw data by modeling the latter ones as a function of both percentile scores and an explanatory variable (e.g., age). The method minimizes bias arising from sampling and measurement error, while handling marked deviations from normality, addressing bottom or ceiling effects and capturing almost all of the variance in the original norm data sample.
2565 Psychometric Models and Methods cocor Comparing Correlations Statistical tests for the comparison between two correlations based on either independent or dependent groups. Dependent correlations can either be overlapping or nonoverlapping. A web interface is available on the website http://comparingcorrelations.org. A plugin for the R GUI and IDE RKWard is included. Please install RKWard from https://rkward.kde.org to use this feature. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard.
2566 Psychometric Models and Methods cocorresp Co-Correspondence Analysis Methods Fits predictive and symmetric co-correspondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.
2567 Psychometric Models and Methods cocron Statistical Comparisons of Two or more Alpha Coefficients Statistical tests for the comparison between two or more alpha coefficients based on either dependent or independent groups of individuals. A web interface is available at http://comparingcronbachalphas.org. A plugin for the R GUI and IDE RKWard is included. Please install RKWard from https:// rkward.kde.org to use this feature. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard.
2568 Psychometric Models and Methods CopyDetect Computing Response Similarity Indices for Multiple-Choice Tests Contains several IRT and non-IRT based response similarity indices proposed in the literature for multiple-choice examinations such as the Omega index, Wollack (1997) <doi:10.1177/01466216970214002>; Generalized Binomial Test, van der Linden & Sotaridona (2006) <doi:10.3102/10769986031003283>; K index, K1 and K2 indices, Sotaridona & Meijer (2002) <doi:10.1111/j.1745-3984.2002.tb01138.x>; and S1 and S2 indices, Sotaridona & Meijer (2003) <doi:10.1111/j.1745-3984.2003.tb01096.x>.
2569 Psychometric Models and Methods covLCA Latent Class Models with Covariate Effects on Underlying and Measured Variables Estimation of latent class models with covariate effects on underlying and measured variables. The measured variables are dichotomous or polytomous, all with the same number of categories.
2570 Psychometric Models and Methods ctsem Continuous Time Structural Equation Modelling A hierarchical, multivariate, continuous (and discrete) time dynamic modelling package for panel and time series data, using stochastic differential equations. Contains a faster frequentist set of functions using OpenMx for single subject and mixed-effects (random intercepts only) structural equation models, or a hierarchical Bayesian implementation using Stan that allows for random effects and non-linearity over all model parameters. Allows for modelling of multiple noisy measurements of multiple stochastic processes, time varying input / event covariates, and time invariant covariates used to predict the parameters. Bayesian formulation not available on 32 bit Windows systems.
2571 Psychometric Models and Methods CTT (core) Classical Test Theory Functions A collection of common test and item analyses from a classical test theory (CTT) framework. Analyses can be applied to both dichotomous and polytomous data. Functions provide reliability analyses (alpha), item statistics, disctractor analyses, disattenuated correlations, scoring routines, and empirical ICCs.
2572 Psychometric Models and Methods CTTShiny Classical Test Theory via Shiny Interactive shiny application for running classical test theory (item analysis).
2573 Psychometric Models and Methods DAKS Data Analysis and Knowledge Spaces Functions and an example dataset for the psychometric theory of knowledge spaces. This package implements data analysis methods and procedures for simulating data and quasi orders and transforming different formulations in knowledge space theory. See package?DAKS for an overview.
2574 Psychometric Models and Methods dexter Data Management and Analysis of Tests A system for the management, assessment, and psychometric analysis of data from educational and psychological tests. Developed at Cito, The Netherlands, with subsidy from the Dutch Ministry of Education, Culture, and Science.
2575 Psychometric Models and Methods dexterMST CML Calibration of Multi Stage Tests Conditional Maximum Likelihood Calibration and data management of multistage tests. Functions for calibration of the Extended Nominal Response and the Interaction models, DIF and profile analysis. See Robert J. Zwitser and Gunter Maris (2015)<doi:10.1007/s11336-013-9369-6>.
2576 Psychometric Models and Methods DFIT Differential Functioning of Items and Tests A set of functions to perform Raju, van der Linden and Fleer’s (1995, doi:10.1177/014662169501900405) Differential Functioning of Items and Tests (DFIT) analyses. It includes functions to use the Monte Carlo Item Parameter Replication approach (Oshima, Raju, & Nanda, 2006, doi:10.1111/j. 1745-3984.2006.00001.x) for obtaining the associated statistical significance tests cut-off points. They may also be used for a priori and post-hoc power calculations (Cervantes, 2017, doi:10.18637/jss.v076.i05).
2577 Psychometric Models and Methods DIFboost Detection of Differential Item Functioning (DIF) in Rasch Models by Boosting Techniques Performs detection of Differential Item Functioning using the method DIFboost as proposed in Schauberger and Tutz (2015): Detection of Differential item functioning in Rasch models by boosting techniques, British Journal of Mathematical and Statistical Psychology.
2578 Psychometric Models and Methods DIFlasso A Penalty Approach to Differential Item Functioning in Rasch Models Performs DIFlasso, a method to detect DIF (Differential Item Functioning) in Rasch Models. It can handle settings with many variables and also metric variables.
2579 Psychometric Models and Methods difNLR DIF and DDF Detection by Non-Linear Regression Models Detection of differential item functioning (DIF) among dichotomously scored items and differential distractor functioning (DDF) among unscored items with non-linear regression procedures based on generalized logistic regression models (Drabinova and Martinkova, 2017, <doi:10.1111/jedm.12158>).
2580 Psychometric Models and Methods difR Collection of Methods to Detect Dichotomous Differential Item Functioning (DIF) Provides a collection of standard methods to detect differential item functioning among dichotomously scored items. Methods for uniform and non-uniform DIF, based on test-score or IRT methods, for comparing two or more than two groups of respondents, are available (Magis, Beland, Tuerlinckx and De Boeck,A General Framework and an R Package for the Detection of Dichotomous Differential Item Functioning, Behavior Research Methods, 42, 2010, 847-862 <doi:10.3758/BRM.42.3.847>).
2581 Psychometric Models and Methods DIFtree Item Focussed Trees for the Identification of Items in Differential Item Functioning Item focussed recursive partitioning for simultaneous selection of items and variables that induce Differential Item Functioning (DIF) in dichotomous or polytomous items.
2582 Psychometric Models and Methods dina Bayesian Estimation of DINA Model Estimate the Deterministic Input, Noisy “And” Gate (DINA) cognitive diagnostic model parameters using the Gibbs sampler described by Culpepper (2015) <doi:10.3102/1076998615595403>.
2583 Psychometric Models and Methods DistatisR DiSTATIS Three Way Metric Multidimensional Scaling Implement DiSTATIS and CovSTATIS (three-way multidimensional scaling). For the analysis of multiple distance/covariance matrices collected on the same set of observations.
2584 Psychometric Models and Methods e1071 Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, …
2585 Psychometric Models and Methods eba Elimination-by-Aspects Models Fitting and testing multi-attribute probabilistic choice models, especially the Bradley-Terry-Luce (BTL) model (Bradley & Terry, 1952 <doi:10.1093/biomet/39.3-4.324>; Luce, 1959), elimination-by-aspects (EBA) models (Tversky, 1972 <doi:10.1037/h0032955>), and preference tree (Pretree) models (Tversky & Sattath, 1979 <doi:10.1037/0033-295X.86.6.542>).
2586 Psychometric Models and Methods ecodist Dissimilarity-Based Functions for Ecological Analysis Dissimilarity-based analysis functions including ordination and Mantel test functions, intended for use with spatial and community data.
2587 Psychometric Models and Methods edstan Stan Models for Item Response Theory Provides convenience functions and pre-programmed Stan models related to item response theory. Its purpose is to make fitting common item response theory models using Stan easy.
2588 Psychometric Models and Methods eegkit Toolkit for Electroencephalography Data Analysis and visualization tools for electroencephalography (EEG) data. Includes functions for (i) plotting EEG data, (ii) filtering EEG data, (iii) smoothing EEG data; (iv) frequency domain (Fourier) analysis of EEG data, (v) Independent Component Analysis of EEG data, and (vi) simulating event-related potential EEG data.
2589 Psychometric Models and Methods EFAutilities Utility Functions for Exploratory Factor Analysis A number of utility function for exploratory factor analysis are included in this package. In particular, it computes standard errors for parameter estimates and factor correlations under a variety of conditions.
2590 Psychometric Models and Methods elasticnet Elastic-Net for Sparse Estimation and Sparse PCA Provides functions for fitting the entire solution path of the Elastic-Net and also provides functions for estimating sparse Principal Components. The Lasso solution paths can be computed by the same function. First version: 2005-10.
2591 Psychometric Models and Methods emIRT EM Algorithms for Estimating Item Response Theory Models Various Expectation-Maximization (EM) algorithms are implemented for item response theory (IRT) models. The current implementation includes IRT models for binary and ordinal responses, along with dynamic and hierarchical IRT models with binary responses. The latter two models are derived and implemented using variational EM. Subsequent edits also include variational network and text scaling models.
2592 Psychometric Models and Methods equate Observed-Score Linking and Equating Contains methods for observed-score linking and equating under the single-group, equivalent-groups, and nonequivalent-groups with anchor test(s) designs. Equating types include identity, mean, linear, general linear, equipercentile, circle-arc, and composites of these. Equating methods include synthetic, nominal weights, Tucker, Levine observed score, Levine true score, Braun/Holland, frequency estimation, and chained equating. Plotting and summary methods, and methods for multivariate presmoothing and bootstrap error estimation are also provided.
2593 Psychometric Models and Methods equateIRT IRT Equating Methods Computation of direct, chain and average (bisector) equating coefficients with standard errors using Item Response Theory (IRT) methods for dichotomous items (Battauz (2013) <doi:10.1007/s11336-012-9316-y>, Battauz (2015) <doi:10.18637/jss.v068.i07>). Test scoring can be performed by true score equating and observed score equating methods. DIF detection can be performed using a Wald-type test (Battauz (2018) <doi:10.1007/s10260-018-00442-w>).
2594 Psychometric Models and Methods equateMultiple Equating of Multiple Forms Equating of multiple forms using Item Response Theory (IRT) methods (Battauz M. (2017) <doi:10.1007/s11336-016-9517-x> and Haberman S. J. (2009) <doi:10.1002/j.2333-8504.2009.tb02197.x>).
2595 Psychometric Models and Methods eRm (core) Extended Rasch Modeling Fits Rasch models (RM), linear logistic test models (LLTM), rating scale model (RSM), linear rating scale models (LRSM), partial credit models (PCM), and linear partial credit models (LPCM). Missing values are allowed in the data matrix. Additional features are the ML estimation of the person parameters, Andersen’s LR-test, item-specific Wald test, Martin-Loef-Test, nonparametric Monte-Carlo Tests, itemfit and personfit statistics including infit and outfit measures, ICC and other plots, automated stepwise item elimination, simulation module for various binary data matrices.
2596 Psychometric Models and Methods esaBcv Estimate Number of Latent Factors and Factor Matrix for Factor Analysis These functions estimate the latent factors of a given matrix, no matter it is high-dimensional or not. It tries to first estimate the number of factors using bi-cross-validation and then estimate the latent factor matrix and the noise variances. For more information about the method, see Art B. Owen and Jingshu Wang 2015 archived article on factor model (http://arxiv.org/abs/1503.03515).
2597 Psychometric Models and Methods EstCRM Calibrating Parameters for the Samejima’s Continuous IRT Model Estimates item and person parameters for the Samejima’s Continuous Response Model (CRM), computes item fit residual statistics, draws empirical 3D item category response curves, draws theoretical 3D item category response curves, and generates data under the CRM for simulation studies.
2598 Psychometric Models and Methods ExPosition Exploratory Analysis with the Singular Value Decomposition A variety of descriptive multivariate analyses with the singular value decomposition, such as principal components analysis, correspondence analysis, and multidimensional scaling. See An ExPosition of the Singular Value Decomposition in R (Beaton et al 2014) <doi:10.1016/j.csda.2013.11.006>.
2599 Psychometric Models and Methods FactoMineR Multivariate Exploratory Data Analysis and Data Mining Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017) <doi:10.1201/b10345-2>.
2600 Psychometric Models and Methods faoutlier Influential Case Detection Methods for Factor Analysis and Structural Equation Models Tools for detecting and summarize influential cases that can affect exploratory and confirmatory factor analysis models as well as structural equation models more generally.
2601 Psychometric Models and Methods fastICA FastICA Algorithms to Perform ICA and Projection Pursuit Implementation of FastICA algorithm to perform Independent Component Analysis (ICA) and Projection Pursuit.
2602 Psychometric Models and Methods fechner Fechnerian Scaling of Discrete Object Sets Functions and example datasets for Fechnerian scaling of discrete object sets. User can compute Fechnerian distances among objects representing subjective dissimilarities, and other related information. See package?fechner for an overview.
2603 Psychometric Models and Methods flexmix Flexible Mixture Modeling A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
2604 Psychometric Models and Methods fourPNO Bayesian 4 Parameter Item Response Model Estimate Barton & Lord’s (1981) <doi:10.1002/j.2333-8504.1981.tb01255.x> four parameter IRT model with lower and upper asymptotes using Bayesian formulation described by Culpepper (2016) <doi:10.1007/s11336-015-9477-6>.
2605 Psychometric Models and Methods fuzzyreg Fuzzy Linear Regression Estimators for fuzzy linear regression. The functions estimate parameters of fuzzy linear regression models with crisp or fuzzy independent variables (triangular fuzzy numbers are supported). Implements multiple methods for parameter estimation and algebraic operations with triangular fuzzy numbers. Includes functions for summarising, printing and plotting the model fit. Calculates predictions from the model and total error of fit. Diamond (1988) <doi:10.1016/0020-0255(88)90047-3>, Hung & Yang (2006) <doi:10.1016/j.fss.2006.08.004>, Lee & Tanaka (1999) <doi:10.15807/jorsj.42.98>, Nasrabadi, Nasrabadi & Nasrabady (2005) <doi:10.1016/j.amc.2004.02.008>, Tanaka, Hayashi & Watada (1989) <doi:10.1016/0377-2217(89)90431-1>, Zeng, Feng & Li (2017) <doi:10.1016/j.asoc.2016.09.029>.
2606 Psychometric Models and Methods fwdmsa Forward search for Mokken scale analysis fwdmsa performs the Forward Search for Mokken scale analysis. It detects outliers, it produces several types of diagnostic plots.
2607 Psychometric Models and Methods GDINA The Generalized DINA Model Framework A set of psychometric tools for cognitive diagnosis modeling for both dichotomous and polytomous responses. Various cognitive diagnosis models can be estimated, include the generalized deterministic inputs, noisy and gate (G-DINA) model by de la Torre (2011) <doi:10.1007/s11336-011-9207-7>, the sequential G-DINA model by Ma and de la Torre (2016) <doi:10.1111/bmsp.12070>, and many other models they subsume. Joint attribute distribution can be independent, saturated, higher-order, loglinear smoothed or structured. Q-matrix validation, item and model fit statistics, model comparison at test and item level and differential item functioning can also be conducted. A graphical user interface is also provided.
2608 Psychometric Models and Methods Gifi Multivariate Analysis with Optimal Scaling Implements categorical principal component analysis (‘PRINCALS’), multiple correspondence analysis (‘HOMALS’). It replaces the ‘homals’ package.
2609 Psychometric Models and Methods GLMMRR Generalized Linear Mixed Model (GLMM) for Binary Randomized Response Data Generalized Linear Mixed Model (GLMM) for Binary Randomized Response Data. Includes Cauchit, Compl. Log-Log, Logistic, and Probit link functions for Bernoulli Distributed RR data. RR Designs: Warner, Forced Response, Unrelated Question, Kuk, Crosswise, and Triangular.
2610 Psychometric Models and Methods GPArotation GPA Factor Rotation Gradient Projection Algorithm Rotation for Factor Analysis. See ?GPArotation.Intro for more details.
2611 Psychometric Models and Methods GPCMlasso Differential Item Functioning in Generalized Partial Credit Models Provides a framework to detect Differential Item Functioning (DIF) in Generalized Partial Credit Models (GPCM) and special cases of the GPCM as proposed by Schauberger and Mair (2019) <doi:10.3758/s13428-019-01224-2>. A joint model is set up where DIF is explicitly parametrized and penalized likelihood estimation is used for parameter selection. The big advantage of the method called GPCMlasso is that several variables can be treated simultaneously and that both continuous and categorical variables can be used to detect DIF.
2612 Psychometric Models and Methods gSEM Semi-Supervised Generalized Structural Equation Modeling Conducts a semi-gSEM statistical analysis (semi-supervised generalized structural equation modeling) on a data frame of coincident observations of multiple predictive or intermediate variables and a final continuous, outcome variable, via two functions sgSEMp1() and sgSEMp2(), representing fittings based on two statistical principles. Principle 1 determines all sensible univariate relationships in the spirit of the Markovian process. The relationship between each pair of variables, including predictors and the final outcome variable, is determined with the Markovian property that the value of the current predictor is sufficient in relating to the next level variable, i.e., the relationship is independent of the specific value of the preceding-level variables to the current predictor, given the current value. Principle 2 resembles the multiple regression principle in the way multiple predictors are considered simultaneously. Specifically, the relationship of the first-level predictors (such as Time and irradiance etc) to the outcome variable (such as, module degradation or yellowing) is fit by a supervised additive model. Then each significant intermediate variable is taken as the new outcome variable and the other variables (except the final outcome variable) as the predictors in investigating the next-level multivariate relationship by a supervised additive model. This fitting process is continued until all sensible models are investigated.
2613 Psychometric Models and Methods gtheory Apply Generalizability Theory with R Estimates variance components, generalizability coefficients, universe scores, and standard errors when observed scores contain variation from one or more measurement facets (e.g., items and raters).
2614 Psychometric Models and Methods homals (core) Gifi Methods for Optimal Scaling Performs a homogeneity analysis (multiple correspondence analysis) and various extensions. Rank restrictions on the category quantifications can be imposed (nonlinear PCA). The categories are transformed by means of optimal scaling with options for nominal, ordinal, and numerical scale levels (for rank-1 restrictions). Variables can be grouped into sets, in order to emulate regression analysis and canonical correlation analysis.
2615 Psychometric Models and Methods ica Independent Component Analysis Independent Component Analysis (ICA) using various algorithms: FastICA, Information-Maximization (Infomax), and Joint Approximate Diagonalization of Eigenmatrices (JADE).
2616 Psychometric Models and Methods ICC Facilitating Estimation of the Intraclass Correlation Coefficient Assist in the estimation of the Intraclass Correlation Coefficient (ICC) from variance components of a one-way analysis of variance and also estimate the number of individuals or groups necessary to obtain an ICC estimate with a desired confidence interval width.
2617 Psychometric Models and Methods iccbeta Multilevel Model Intraclass Correlation for Slope Heterogeneity A function and vignettes for computing an intraclass correlation described in Aguinis & Culpepper (2015) <doi:10.1177/1094428114563618>. This package quantifies the share of variance in a dependent variable that is attributed to group heterogeneity in slopes.
2618 Psychometric Models and Methods ifaTools Toolkit for Item Factor Analysis with ‘OpenMx’ Tools, tutorials, and demos of Item Factor Analysis using ‘OpenMx’.
2619 Psychometric Models and Methods immer Item Response Models for Multiple Ratings Implements some item response models for multiple ratings, including the hierarchical rater model, conditional maximum likelihood estimation of linear logistic partial credit model and a wrapper function to the commercial FACETS program. See Robitzsch and Steinfeld (2018) for a description of the functionality of the package. See Wang, Su & Qiu (2014; <doi:10.1111/jedm.12045>) for an overview of modeling alternatives.
2620 Psychometric Models and Methods influence.SEM Case Influence in Structural Equation Models A set of tools for evaluating several measures of case influence for structural equation models.
2621 Psychometric Models and Methods irr Various Coefficients of Interrater Reliability and Agreement Coefficients of Interrater Reliability and Agreement for quantitative, ordinal and nominal data: ICC, Finn-Coefficient, Robinson’s A, Kendall’s W, Cohen’s Kappa, …
2622 Psychometric Models and Methods irtDemo Item Response Theory Demo Collection Includes a collection of shiny applications to demonstrate or to explore fundamental item response theory (IRT) concepts such as estimation, scoring, and multidimensional IRT models.
2623 Psychometric Models and Methods irtoys A Collection of Functions Related to Item Response Theory (IRT) A collection of functions useful in learning and practicing IRT, which can be combined into larger programs. Provides basic CTT analysis, a simple common interface to the estimation of item parameters in IRT models for binary responses with three different programs (ICL, BILOG-MG, and ltm), ability estimation (MLE, BME, EAP, WLE, plausible values), item and person fit statistics, scaling methods (MM, MS, Stocking-Lord, and the complete Hebaera method), and a rich array of parametric and non-parametric (kernel) plots. Estimates and plots Haberman’s interaction model when all items are dichotomously scored.
2624 Psychometric Models and Methods irtProb Utilities and Probability Distributions Related to Multidimensional Person Item Response Models Multidimensional Person Item Response Theory probability distributions
2625 Psychometric Models and Methods irtrees Estimation of Tree-Based Item Response Models Helper functions and example data sets accompanying De Boeck, P. and Partchev, I. (2012) IRTrees: Tree-Based Item Response Models of the GLMM Family, Journal of Statistical Software - Code Snippets, 48(1), 1-28.
2626 Psychometric Models and Methods IRTShiny Item Response Theory via Shiny Interactive shiny application for running Item Response Theory analysis. Provides graphics for characteristic and information curves.
2627 Psychometric Models and Methods kcirt k-Cube Thurstonian IRT Models Create, Simulate, Fit, Solve k-Cube Thurstonian IRT Models
2628 Psychometric Models and Methods kequate The Kernel Method of Test Equating Implements the kernel method of test equating using the CB, EG, SG, NEAT CE/PSE and NEC designs, supporting gaussian, logistic and uniform kernels and unsmoothed and pre-smoothed input data.
2629 Psychometric Models and Methods kst Knowledge Space Theory Knowledge space theory by Doignon and Falmagne (1999) <doi:10.1007/978-3-642-58625-5> is a set- and order-theoretical framework, which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The ‘kst’ package provides basic functionalities to generate, handle, and manipulate knowledge structures and knowledge spaces.
2630 Psychometric Models and Methods labdsv Ordination and Multivariate Analysis for Ecology A variety of ordination and community analyses useful in analysis of data sets in community ecology. Includes many of the common ordination methods, with graphical routines to facilitate their interpretation, as well as several novel analyses.
2631 Psychometric Models and Methods LAM Some Latent Variable Models Includes some procedures for latent variable modeling with a particular focus on multilevel data. The ‘LAM’ package contains mean and covariance structure modelling for multivariate normally distributed data (mlnormal(); Longford, 1987; <doi:10.1093/biomet/74.4.817>), a general Metropolis-Hastings algorithm (amh(); Roberts & Rosenthal, 2001, <doi:10.1214/ss/1015346320>) and penalized maximum likelihood estimation (pmle(); Cole, Chu & Greenland, 2014; <doi:10.1093/aje/kwt245>).
2632 Psychometric Models and Methods latdiag Draws Diagrams Useful for Checking Latent Scales A graph proposed by Rosenbaum is useful for checking some properties of various sorts of latent scale, this program generates commands to obtain the graph using ‘dot’ from ‘graphviz’.
2633 Psychometric Models and Methods lava Latent Variable Models A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2019) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
2634 Psychometric Models and Methods lava.tobit Latent Variable Models with Censored and Binary Outcomes Lava plugin allowing combinations of left and right censored and binary outcomes.
2635 Psychometric Models and Methods lavaan (core) Latent Variable Analysis Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
2636 Psychometric Models and Methods lavaan.survey Complex Survey Structural Equation Modeling (SEM) Fit structural equation models (SEM) including factor analysis, multivariate regression models with latent variables and many other latent variable models while correcting estimates, standard errors, and chi-square-derived fit measures for a complex sampling design. Incorporate clustering, stratification, sampling weights, and finite population corrections into a SEM analysis. Wrapper around packages lavaan and survey.
2637 Psychometric Models and Methods lba Latent Budget Analysis for Compositional Data Latent budget analysis is a method for the analysis of a two-way contingency table with an exploratory variable and a response variable. It is specially designed for compositional data.
2638 Psychometric Models and Methods LCAvarsel Variable Selection for Latent Class Analysis Variable selection for latent class analysis for model-based clustering of multivariate categorical data. The package implements a general framework for selecting the subset of variables with relevant clustering information and discard those that are redundant and/or not informative. The variable selection method is based on the approach of Fop et al. (2017) <doi:10.1214/17-AOAS1061> and Dean and Raftery (2010) <doi:10.1007/s10463-009-0258-9>. Different algorithms are available to perform the selection: stepwise, swap-stepwise and evolutionary stochastic search. Concomitant covariates used to predict the class membership probabilities can also be included in the latent class analysis model. The selection procedure can be run in parallel on multiple cores machines.
2639 Psychometric Models and Methods lcda Latent Class Discriminant Analysis Local Discrimination via Latent Class Models
2640 Psychometric Models and Methods lisrelToR Import output from LISREL into R This is an unofficial package aimed at automating the import of LISREL output in R. This package or its maintainer is not in any way affiliated with the creators of LISREL and SSI, Inc.
2641 Psychometric Models and Methods lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
2642 Psychometric Models and Methods LNIRT LogNormal Response Time Item Response Theory Models Allows the simultaneous analysis of responses and response times in an Item Response Theory (IRT) modelling framework. Supports variable person speed functions (intercept, trend, quadratic), and covariates for item and person (random) parameters. Data missing-by-design can be specified. Parameter estimation is done with a MCMC algorithm. LNIRT replaces the package CIRT, which was written by Rinke Klein Entink. For reference, see the paper by Fox, Klein Entink and Van der Linden (2007), “Modeling of Responses and Response Times with the Package cirt”, Journal of Statistical Software, <doi:10.18637/jss.v020.i07>.
2643 Psychometric Models and Methods lordif Logistic Ordinal Regression Differential Item Functioning using IRT Analysis of Differential Item Functioning (DIF) for dichotomous and polytomous items using an iterative hybrid of ordinal logistic regression and item response theory (IRT).
2644 Psychometric Models and Methods lsl Latent Structure Learning Fits structural equation modeling via penalized likelihood.
2645 Psychometric Models and Methods ltbayes Simulation-Based Bayesian Inference for Latent Traits of Item Response Models Functions for simulating realizations from the posterior distribution of a latent trait of an item response model. Distributions are conditional on one or a subset of response patterns (e.g., sum scores). Functions for computing likelihoods, Fisher and observed information, posterior modes, and profile likelihood confidence intervals are also included. These functions are designed to be easily amenable to user-specified models.
2646 Psychometric Models and Methods ltm (core) Latent Trait Models under IRT Analysis of multivariate dichotomous and polytomous data using latent trait models under the Item Response Theory approach. It includes the Rasch, the Two-Parameter Logistic, the Birnbaum’s Three-Parameter, the Graded Response, and the Generalized Partial Credit Models.
2647 Psychometric Models and Methods MASS Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
2648 Psychometric Models and Methods MBESS The MBESS R Package Implements methods that useful in designing research studies and analyzing data, with particular emphasis on methods that are developed for or used within the behavioral, educational, and social sciences (broadly defined). That being said, many of the methods implemented within MBESS are applicable to a wide variety of disciplines. MBESS has a suite of functions for a variety of related topics, such as effect sizes, confidence intervals for effect sizes (including standardized effect sizes and noncentral effect sizes), sample size planning (from the accuracy in parameter estimation [AIPE], power analytic, equivalence, and minimum-risk point estimation perspectives), mediation analysis, various properties of distributions, and a variety of utility functions. MBESS (pronounced ‘em-bes’) was originally an acronym for ‘Methods for the Behavioral, Educational, and Social Sciences,’ but at this point MBESS contains methods applicable and used in a wide variety of fields and is an orphan acronym, in the sense that what was an acronym is now literally its name. MBESS has greatly benefited from others, see <http://nd.edu/~kkelley/site/MBESS.html> for a detailed list of those that have contributed and other details.
2649 Psychometric Models and Methods MCAvariants Multiple Correspondence Analysis Variants Provides two variants of multiple correspondence analysis (ca): multiple ca and ordered multiple ca via orthogonal polynomials of Emerson.
2650 Psychometric Models and Methods MCMCglmm MCMC Generalised Linear Mixed Models MCMC Generalised Linear Mixed Models.
2651 Psychometric Models and Methods MCMCpack Markov Chain Monte Carlo (MCMC) Package Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
2652 Psychometric Models and Methods medflex Flexible Mediation Analysis Using Natural Effect Models Run flexible mediation analyses using natural effect models as described in Lange, Vansteelandt and Bekaert (2012) <doi:10.1093/aje/kwr525>, Vansteelandt, Bekaert and Lange (2012) <doi:10.1515/2161-962X.1014> and Loeys, Moerkerke, De Smet, Buysse, Steen and Vansteelandt (2013) <doi:10.1080/00273171.2013.832132>.
2653 Psychometric Models and Methods mediation Causal Mediation Analysis We implement parametric and non parametric mediation analysis. This package performs the methods and suggestions in Imai, Keele and Yamamoto (2010) <doi:10.1214/10-STS321>, Imai, Keele and Tingley (2010) <doi:10.1037/a0020761>, Imai, Tingley and Yamamoto (2013) <doi:10.1111/j.1467-985X.2012.01032.x>, Imai and Yamamoto (2013) <doi:10.1093/pan/mps040> and Yamamoto (2013) <http://web.mit.edu/teppei/www/research/IVmediate.pdf>. In addition to the estimation of causal mediation effects, the software also allows researchers to conduct sensitivity analysis for certain parametric models.
2654 Psychometric Models and Methods metaSEM Meta-Analysis using Structural Equation Modeling A collection of functions for conducting meta-analysis using a structural equation modeling (SEM) approach via the ‘OpenMx’ and ‘lavaan’ packages. It also implements various procedures to perform meta-analytic structural equation modeling on the correlation and covariance matrices.
2655 Psychometric Models and Methods MIIVsem Model Implied Instrumental Variable (MIIV) Estimation of Structural Equation Models Functions for estimating structural equation models using instrumental variables.
2656 Psychometric Models and Methods mirt (core) Multidimensional Item Response Theory Analysis of dichotomous and polytomous response data using unidimensional and multidimensional latent trait models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory models can be estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier analyses are available for modeling item testlets. Multiple group analysis and mixed effects designs also are available for detecting differential item and test functioning as well as modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, and several other discrete latent variable models, including mixture and zero-inflated response models, are supported.
2657 Psychometric Models and Methods mirtCAT Computerized Adaptive Testing with Multidimensional Item Response Theory Provides tools to generate an HTML interface for creating adaptive and non-adaptive educational and psychological tests using the shiny package (Chalmers (2016) <doi:10.18637/jss.v071.i05>). Suitable for applying unidimensional and multidimensional computerized adaptive tests (CAT) using item response theory methodology and for creating simple questionnaires forms to collect response data directly in R. Additionally, optimal test designs (e.g., “shadow testing”) are supported for tests which contain a large number of item selection constraints. Finally, package contains tools useful for performing Monte Carlo simulations for studying the behavior of computerized adaptive test banks.
2658 Psychometric Models and Methods mirtjml Joint Maximum Likelihood Estimation for High-Dimensional Item Factor Analysis Provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing ‘OpenMP’ API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2017). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. arXiv preprint <arXiv:1712.08966>.
2659 Psychometric Models and Methods missMDA Handling Missing Values with Multivariate Data Analysis Imputation of incomplete continuous or categorical datasets; Missing values are imputed with a principal component analysis (PCA), a multiple correspondence analysis (MCA) model or a multiple factor analysis (MFA) model; Perform multiple imputation with and in PCA or MCA.
2660 Psychometric Models and Methods mixRasch Mixture Rasch Models with JMLE Estimates Rasch models and mixture Rasch models, including the dichotomous Rasch model, the rating scale model, and the partial credit model.
2661 Psychometric Models and Methods MLCIRTwithin Latent Class Item Response Theory (LC-IRT) Models under Within-Item Multidimensionality Framework for the Item Response Theory analysis of dichotomous and ordinal polytomous outcomes under the assumption of within-item multidimensionality and discreteness of the latent traits. The fitting algorithms allow for missing responses and for different item parametrizations and are based on the Expectation-Maximization paradigm. Individual covariates affecting the class weights may be included in the new version together with possibility of constraints on all model parameters.
2662 Psychometric Models and Methods MLDS Maximum Likelihood Difference Scaling Difference scaling is a method for scaling perceived supra-threshold differences. The package contains functions that allow the user to design and run a difference scaling experiment, to fit the resulting data by maximum likelihood and test the internal validity of the estimated scale.
2663 Psychometric Models and Methods modelfree Model-free estimation of a psychometric function Local linear estimation of psychometric functions. Provides functions for nonparametric estimation of a psychometric function and for estimation of a derived threshold and slope, and their standard deviations and confidence intervals
2664 Psychometric Models and Methods mokken (core) Conducts Mokken Scale Analysis Contains functions for performing Mokken scale analysis on test and questionnaire data (e.g., Sijtsma and Van der Ark, 2017, <doi:10.1111/bmsp.12078>). It includes an automated item selection algorithm, and various checks of model assumptions.
2665 Psychometric Models and Methods MplusAutomation An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus Leverages the R language to automate latent variable model estimation and interpretation using ‘Mplus’, a powerful latent variable modeling program developed by Muthen and Muthen (<http://www.statmodel.com>). Specifically, this package provides routines for creating related groups of models, running batches of models, and extracting and tabulating model parameters and fit statistics.
2666 Psychometric Models and Methods mpt Multinomial Processing Tree Models Fitting and testing multinomial processing tree (MPT) models, a class of nonlinear models for categorical data. The parameters are the link probabilities of a tree-like graph and represent the latent cognitive processing steps executed to arrive at observable response categories (Batchelder & Riefer, 1999 <doi:10.3758/bf03210812>; Erdfelder et al., 2009 <doi:10.1027/0044-3409.217.3.108>; Riefer & Batchelder, 1988 <doi:10.1037/0033-295x.95.3.318>).
2667 Psychometric Models and Methods MPTinR Analyze Multinomial Processing Tree Models Provides a user-friendly way for the analysis of multinomial processing tree (MPT) models (e.g., Riefer, D. M., and Batchelder, W. H. [1988]. Multinomial modeling and the measurement of cognitive processes. Psychological Review, 95, 318-339) for single and multiple datasets. The main functions perform model fitting and model selection. Model selection can be done using AIC, BIC, or the Fisher Information Approximation (FIA) a measure based on the Minimum Description Length (MDL) framework. The model and restrictions can be specified in external files or within an R script in an intuitive syntax or using the context-free language for MPTs. The ‘classical’ .EQN file format for model files is also supported. Besides MPTs, this package can fit a wide variety of other cognitive models such as SDT models (see fit.model). It also supports multicore fitting and FIA calculation (using the snowfall package), can generate or bootstrap data for simulations, and plot predicted versus observed data.
2668 Psychometric Models and Methods mRm An R Package for Conditional Maximum Likelihood Estimation in Mixed Rasch Models Conditional maximum likelihood estimation via the EM algorithm and information-criterion-based model selection in binary mixed Rasch models.
2669 Psychometric Models and Methods mudfold Multiple UniDimensional unFOLDing Nonparametric unfolding item response theory (IRT) model for dichotomous data (see W.H. Van Schuur (1984). Structure in Political Beliefs: A New Model for Stochastic Unfolding with Application to European Party Activists, and W.J.Post (1992). Nonparametric Unfolding Models: A Latent Structure Approach). The package implements MUDFOLD (Multiple UniDimensional unFOLDing), an iterative item selection algorithm that constructs unfolding scales from dichotomous preferential-choice data without explicitly assuming a parametric form of the item response functions. Scale diagnostics from Post(1992) and estimates for the person locations proposed by Johnson(2006) and Van Schuur(1984) are also available. This model can be seen as the unfolding variant of Mokken(1971) scaling method.
2670 Psychometric Models and Methods MultiLCIRT Multidimensional Latent Class Item Response Theory Models Framework for the Item Response Theory analysis of dichotomous and ordinal polytomous outcomes under the assumption of multidimensionality and discreteness of the latent traits. The fitting algorithms allow for missing responses and for different item parameterizations and are based on the Expectation-Maximization paradigm. Individual covariates affecting the class weights may be included in the new version (since 2.1).
2671 Psychometric Models and Methods multiplex Algebraic Tools for the Analysis of Multiple Social Networks Algebraic procedures for the analysis of multiple social networks are delivered with this package. Among other things, it makes possible to create and manipulate multivariate network data with different formats, and there are effective ways available to treat multiple networks with routines that combine algebraic systems like the partially ordered semigroup or the semiring structure together with the relational bundles occurring in different types of multivariate network data sets. It also provides an algebraic approach for two-mode networks through Galois derivations between families of the pairs of subsets in the two domains.
2672 Psychometric Models and Methods multiway Component Models for Multi-Way Data Fits multi-way component models via alternating least squares algorithms with optional constraints. Fit models include N-way Canonical Polyadic Decomposition, Individual Differences Scaling, Multiway Covariates Regression, Parallel Factor Analysis (1 and 2), Simultaneous Component Analysis, and Tucker Factor Analysis.
2673 Psychometric Models and Methods munfold Metric Unfolding Multidimensional unfolding using Schoenemann’s algorithm for metric and Procrustes rotation of unfolding results.
2674 Psychometric Models and Methods NetworkToolbox Methods and Measures for Brain, Cognitive, and Psychometric Network Analysis Implements network analysis and graph theory measures used in neuroscience, cognitive science, and psychology. Methods include various filtering methods and approaches such as threshold, dependency (Kenett, Tumminello, Madi, Gur-Gershogoren, Mantegna, & Ben-Jacob, 2010 <doi:10.1371/journal.pone.0015032>), Information Filtering Networks (Barfuss, Massara, Di Matteo, & Aste, 2016 <doi:10.1103/PhysRevE.94.062306>), and Efficiency-Cost Optimization (Fallani, Latora, & Chavez, 2017 <doi:10.1371/journal.pcbi.1005305>). Brain methods include the recently developed Connectome Predictive Modeling (see references in package). Also implements several network measures including local network characteristics (e.g., centrality), community-level network characteristics (e.g., community centrality), global network characteristics (e.g., clustering coefficient), and various other measures associated with the reliability and reproducibility of network analysis.
2675 Psychometric Models and Methods nFactors Parallel Analysis and Non Graphical Solutions to the Cattell Scree Test Indices, heuristics and strategies to help determine the number of factors/components to retain: 1. Acceleration factor (af with or without Parallel Analysis); 2. Optimal Coordinates (noc with or without Parallel Analysis); 3. Parallel analysis (components, factors and bootstrap); 4. lambda > mean(lambda) (Kaiser, CFA and related); 5. Cattell-Nelson-Gorsuch (CNG); 6. Zoski and Jurs multiple regression (b, t and p); 7. Zoski and Jurs standard error of the regression coeffcient (sescree); 8. Nelson R2; 9. Bartlett khi-2; 10. Anderson khi-2; 11. Lawley khi-2 and 12. Bentler-Yuan khi-2.
2676 Psychometric Models and Methods nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
2677 Psychometric Models and Methods nlsem Fitting Structural Equation Mixture Models Estimation of structural equation models with nonlinear effects and underlying nonnormal distributions.
2678 Psychometric Models and Methods nsprcomp Non-Negative and Sparse PCA Two methods for performing a constrained principal component analysis (PCA), where non-negativity and/or sparsity constraints are enforced on the principal axes (PAs). The function ‘nsprcomp’ computes one principal component (PC) after the other. Each PA is optimized such that the corresponding PC has maximum additional variance not explained by the previous components. In contrast, the function ‘nscumcomp’ jointly computes all PCs such that the cumulative variance is maximal. Both functions have the same interface as the ‘prcomp’ function from the ‘stats’ package (plus some extra parameters), and both return the result of the analysis as an object of class ‘nsprcomp’, which inherits from ‘prcomp’. See <https://sigg-iten.ch/learningbits/2013/05/27/nsprcomp-is-on-cran/> and Sigg et al. (2008) <doi:10.1145/1390156.1390277> for more details.
2679 Psychometric Models and Methods OpenMx (core) Extended Structural Equation Modelling Facilitates treatment of statistical model specifications as things that can be generated and manipulated programmatically. Structural equation models may be specified with reticular action model matrices or paths, linear structural relations matrices or paths, or directly in matrix algebra. Fit functions include full information maximum likelihood, maximum likelihood, and weighted least squares. Example models include confirmatory factor, multiple group, mixture distribution, categorical threshold, modern test theory, differential equations, state space, and many others. MacOS users can download the most up-to-date package binaries from <http://openmx.ssri.psu.edu>. See Neale, Hunter, Pritikin, Zahery, Brick, Kirkpatrick, Estabrook, Bates, Maes, & Boker (2016) <doi:10.1007/s11336-014-9435-8>.
2680 Psychometric Models and Methods optiscale Optimal scaling Tools for performing an optimal scaling transformation on a data vector
2681 Psychometric Models and Methods ordinal Regression Models for Ordinal Data Implementation of cumulative link (mixed) models also known as ordered regression models, proportional odds models, proportional hazards models for grouped survival times and ordered logit/probit/… models. Estimation is via maximum likelihood and mixed models are fitted with the Laplace approximation and adaptive Gauss-Hermite quadrature. Multiple random effect terms are allowed and they may be nested, crossed or partially nested/crossed. Restrictions of symmetry and equidistance can be imposed on the thresholds (cut-points/intercepts). Standard model methods are available (summary, anova, drop-methods, step, confint, predict etc.) in addition to profile methods and slice methods for visualizing the likelihood function and checking convergence.
2682 Psychometric Models and Methods pairwise Rasch Model Parameters by Pairwise Algorithm Performs the explicit calculation not estimation! of the Rasch item parameters for dichotomous and polytomous item responses, using a pairwise comparison approach. Person parameters (WLE) are calculated according to Warm’s weighted likelihood approach.
2683 Psychometric Models and Methods paran Horn’s Test of Principal Components/Factors An implementation of Horn’s technique for numerically and graphically evaluating the components or factors retained in a principle components analysis (PCA) or common factor analysis (FA). Horn’s method contrasts eigenvalues produced through a PCA or FA on a number of random data sets of uncorrelated variables with the same number of variables and observations as the experimental or observational data set to produce eigenvalues for components or factors that are adjusted for the sample error-induced inflation. Components with adjusted eigenvalues greater than one are retained. paran may also be used to conduct parallel analysis following Glorfeld’s (1995) suggestions to reduce the likelihood of over-retention.
2684 Psychometric Models and Methods pathmox Pathmox Approach of Segmentation Trees in Partial Least Squares Path Modeling pathmox, the cousin package of plspm, provides a very interesting solution for handling segmentation variables in PLS Path Modeling: segmentation trees in PLS Path Modeling.
2685 Psychometric Models and Methods pcaPP Robust PCA by Projection Pursuit Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) <doi:10.2139/ssrn.968376>, Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/978-3-642-33042-1_31>.
2686 Psychometric Models and Methods pcIRT IRT Models for Polytomous and Continuous Item Responses Estimates the multidimensional polytomous Rasch model (Rasch, 1961) and the Continuous Rating Scale model (Mueller, 1987).
2687 Psychometric Models and Methods PCMRS Model Response Styles in Partial Credit Models Implementation of PCMRS (Partial Credit Model with Response Styles) as proposed in by Tutz, Schauberger and Berger (2016) <https://epub.ub.uni-muenchen.de/29373/> . PCMRS is an extension of the regular partial credit model. PCMRS allows for an additional person parameter that characterizes the response style of the person. By taking the response style into account, the estimates of the item parameters are less biased than in partial credit models.
2688 Psychometric Models and Methods piecewiseSEM Piecewise Structural Equation Modeling Implements piecewise structural equation modeling from a single list of structural equations, with new methods for non-linear, latent, and composite variables, standardized coefficients, query-based prediction and indirect effects. See <http://jslefche.github.io/piecewiseSEM/> for more.
2689 Psychometric Models and Methods pks Probabilistic Knowledge Structures Fitting and testing probabilistic knowledge structures, especially the basic local independence model (BLIM, Doignon & Flamagne, 1999), using the minimum discrepancy maximum likelihood (MDML) method.
2690 Psychometric Models and Methods PLmixed Estimate (Generalized) Linear Mixed Models with Factor Structures Utilizes the ‘lme4’ package and the optim() function from ‘stats’ to estimate (generalized) linear mixed models (GLMM) with factor structures using a profile likelihood approach, as outlined in Jeon and Rabe-Hesketh (2012) <doi:10.3102/1076998611417628>. Factor analysis and item response models can be extended to allow for an arbitrary number of nested and crossed random effects, making it useful for multilevel and cross-classified models.
2691 Psychometric Models and Methods plotSEMM Graphing Nonlinear Relations Among Latent Variables from Structural Equation Mixture Models Contains a graphical user interface to generate the diagnostic plots proposed by Bauer (2005; <doi:10.1207/s15328007sem1204_1>), Pek & Chalmers (2015; <doi:10.1080/10705511.2014.937790>), and Pek, Chalmers, R. Kok, & Losardo (2015; <doi:10.3102/1076998615589129>) to investigate nonlinear bivariate relationships in latent regression models using structural equation mixture models (SEMMs).
2692 Psychometric Models and Methods plRasch Log Linear by Linear Association models and Rasch family models by pseudolikelihood estimation Fit Log Linear by Linear Association models and Rasch family models by pseudolikelihood estimation
2693 Psychometric Models and Methods pls Partial Least Squares and Principal Component Regression Multivariate regression methods Partial Least Squares Regression (PLSR), Principal Component Regression (PCR) and Canonical Powered Partial Least Squares (CPPLS).
2694 Psychometric Models and Methods plspm Tools for Partial Least Squares Path Modeling (PLS-PM) Partial Least Squares Path Modeling (PLS-PM) analysis for both metric and non-metric data, as well as REBUS analysis.
2695 Psychometric Models and Methods poLCA Polytomous variable Latent Class Analysis Latent class analysis and latent class regression models for polytomous outcome variables. Also known as latent structure analysis.
2696 Psychometric Models and Methods polycor Polychoric and Polyserial Correlations Computes polychoric and polyserial correlations by quick “two-step” methods or ML, optionally with standard errors; tetrachoric and biserial correlations are special cases.
2697 Psychometric Models and Methods PP Person Parameter Estimation The PP package includes estimation of (MLE, WLE, MAP, EAP, ROBUST) person parameters for the 1,2,3,4-PL model and the GPCM (generalized partial credit model). The parameters are estimated under the assumption that the item parameters are known and fixed. The package is useful e.g. in the case that items from an item pool / item bank with known item parameters are administered to a new population of test-takers and an ability estimation for every test-taker is needed.
2698 Psychometric Models and Methods prefmod (core) Utilities to Fit Paired Comparison Models for Preferences Generates design matrix for analysing real paired comparisons and derived paired comparison data (Likert type items/ratings or rankings) using a loglinear approach. Fits loglinear Bradley-Terry model (LLBT) exploiting an eliminate feature. Computes pattern models for paired comparisons, rankings, and ratings. Some treatment of missing values (MCAR and MNAR). Fits latent class (mixture) models for paired comparison, rating and ranking patterns using a non-parametric ML approach.
2699 Psychometric Models and Methods profileR Profile Analysis of Multivariate Data in R A suite of multivariate methods and data visualization tools to implement profile analysis and cross-validation techniques described in Davison & Davenport (2002) <doi:10.1037/1082-989X.7.4.468>, Bulut (2013), and other published and unpublished resources. The package includes routines to perform criterion-related profile analysis, profile analysis via multidimensional scaling, moderated profile analysis, profile analysis by group, and a within-person factor model to derive score profiles.
2700 Psychometric Models and Methods pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
2701 Psychometric Models and Methods psy (core) Various procedures used in psychometry Kappa, ICC, Cronbach alpha, screeplot, mtmm
2702 Psychometric Models and Methods psych (core) Procedures for Psychological, Psychometric, and Personality Research A general purpose toolbox for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Functions for simulating and testing particular item and test structures are included. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r> web page.
2703 Psychometric Models and Methods psychometric Applied Psychometric Theory Contains functions useful for correlation theory, meta-analysis (validity-generalization), reliability, item analysis, inter-rater reliability, and classical utility
2704 Psychometric Models and Methods psychomix Psychometric Mixture Models Psychometric mixture models based on ‘flexmix’ infrastructure. At the moment Rasch mixture models with different parameterizations of the score distribution (saturated vs. mean/variance specification), Bradley-Terry mixture models, and MPT mixture models are implemented. These mixture models can be estimated with or without concomitant variables. See vignette(‘raschmix’, package = ‘psychomix’) for details on the Rasch mixture models.
2705 Psychometric Models and Methods psychotools Psychometric Modeling Infrastructure Infrastructure for psychometric modeling such as data classes (for item response data and paired comparisons), basic model fitting functions (for Bradley-Terry, Rasch, parametric logistic IRT, generalized partial credit, rating scale, multinomial processing tree models), extractor functions for different types of parameters (item, person, threshold, discrimination, guessing, upper asymptotes), unified inference and visualizations, and various datasets for illustration. Intended as a common lightweight and efficient toolbox for psychometric modeling and a common building block for fitting psychometric mixture models in package “psychomix” and trees based on psychometric models in package “psychotree”.
2706 Psychometric Models and Methods psychotree (core) Recursive Partitioning Based on Psychometric Models Recursive partitioning based on psychometric models, employing the general MOB algorithm (from package partykit) to obtain Bradley-Terry trees, Rasch trees, rating scale and partial credit trees, and MPT trees.
2707 Psychometric Models and Methods psyphy Functions for analyzing psychophysical data in R An assortment of functions that could be useful in analyzing data from psychophysical experiments. It includes functions for calculating d’ from several different experimental designs, links for m-alternative forced-choice (mafc) data to be used with the binomial family in glm (and possibly other contexts) and self-Start functions for estimating gamma values for CRT screen calibrations.
2708 Psychometric Models and Methods PTAk Principal Tensor Analysis on k Modes A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting non-identity metrics and penalisations. 2-way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tucker-n) and PARAFAC/CANDECOMP with these extensions.
2709 Psychometric Models and Methods pwrRasch Statistical Power Simulation for Testing the Rasch Model Statistical power simulation for testing the Rasch Model based on a three-way analysis of variance design with mixed classification.
2710 Psychometric Models and Methods qcv Quantifying Construct Validity Primarily, the ‘qcv’ package computes key indices related to the Quantifying Construct Validity procedure (QCV; Westen & Rosenthal, 2003 <doi:10.1037/0022-3514.84.3.608>; see also Furr & Heuckeroth, in press). The qcv() function is the heart of the ‘qcv’ package, but additional functions in the package provide useful ancillary information related to the QCV procedure.
2711 Psychometric Models and Methods qgraph Graph Plotting Methods, Psychometric Data Visualization and Graphical Model Estimation Weighted network visualization and analysis, as well as Gaussian graphical model computation. See Epskamp et al. (2012) <doi:10.18637/jss.v048.i04>.
2712 Psychometric Models and Methods QuantPsyc Quantitative Psychology Tools Contains functions useful for data screening, testing moderation, mediation and estimating power.
2713 Psychometric Models and Methods quickpsy Fits Psychometric Functions for Multiple Groups Quickly fits and plots psychometric functions (normal, logistic, Weibull or any or any function defined by the user) for multiple groups.
2714 Psychometric Models and Methods randomLCA Random Effects Latent Class Analysis Fits standard and random effects latent class models. The single level random effects model is described in Qu et al <doi:10.2307/2533043> and the two level random effects model in Beath and Heller <doi:10.1177/1471082X0800900302>. Examples are given for their use in diagnostic testing.
2715 Psychometric Models and Methods RaschSampler Rasch Sampler MCMC based sampling of binary matrices with fixed margins as used in exact Rasch model tests.
2716 Psychometric Models and Methods regsem Regularized Structural Equation Modeling Uses both ridge and lasso penalties (and extensions) to penalize specific parameters in structural equation models. The package offers additional cost functions, cross validation, and other extensions beyond traditional structural equation models. Also contains a function to perform exploratory mediation (XMed).
2717 Psychometric Models and Methods REQS R/EQS Interface This package contains the function run.eqs() which calls an EQS script file, executes the EQS estimation, and, finally, imports the results as R objects. These two steps can be performed separately: call.eqs() calls and executes EQS, whereas read.eqs() imports existing EQS outputs as objects into R. It requires EQS 6.2 (build 98 or higher).
2718 Psychometric Models and Methods rpf Response Probability Functions The purpose of this package is to factor out logic and math common to Item Factor Analysis fitting, diagnostics, and analysis. It is envisioned as core support code suitable for more specialized IRT packages to build upon. Complete access to optimized C functions are made available with R_RegisterCCallable().
2719 Psychometric Models and Methods rrum Bayesian Estimation of the Reduced Reparameterized Unified Model with Gibbs Sampling Implementation of Gibbs sampling algorithm for Bayesian Estimation of the Reduced Reparameterized Unified Model (‘rrum’), described by Culpepper and Hudson (2017) <doi:10.1177/0146621617707511>.
2720 Psychometric Models and Methods rsem Robust Structural Equation Modeling with Missing Data and Auxiliary Variables A robust procedure is implemented to estimate means and covariance matrix of multiple variables with missing data using Huber weight and then to estimate a structural equation model.
2721 Psychometric Models and Methods sem (core) Structural Equation Models Functions for fitting general linear structural equation models (with observed and latent variables) using the RAM approach, and for fitting structural equations in observed-variable models by two-stage least squares.
2722 Psychometric Models and Methods semdiag Structural equation modeling diagnostics Outlier and leverage diagnostics for SEM.
2723 Psychometric Models and Methods semds Structural Equation Multidimensional Scaling Fits a structural equation multidimensional scaling (SEMDS) model for asymmetric and three-way input dissimilarities. It assumes that the dissimilarities are measured with errors. The latent dissimilarities are estimated as factor scores within an SEM framework while the objects are represented in a low-dimensional space as in MDS.
2724 Psychometric Models and Methods SEMID Identifiability of Linear Structural Equation Models Provides routines to check identifiability or non-identifiability of linear structural equation models as described in Drton, Foygel, and Sullivant (2011) <doi:10.1214/10-AOS859>, Foygel, Draisma, and Drton (2012) <doi:10.1214/12-AOS1012>, and other works. The routines are based on the graphical representation of structural equation models by a path diagram/mixed graph.
2725 Psychometric Models and Methods SEMModComp Model Comparisons for SEM Conduct tests of difference in fit for mean and covariance structure models as in structural equation modeling (SEM)
2726 Psychometric Models and Methods semPlot Path Diagrams and Visual Analysis of Various SEM Packages’ Output Path diagrams and visual analysis of various SEM packages’ output.
2727 Psychometric Models and Methods semPLS Structural Equation Modeling Using Partial Least Squares Fits structural equation models using partial least squares (PLS). The PLS approach is referred to as ‘soft-modeling’ technique requiring no distributional assumptions on the observed data.
2728 Psychometric Models and Methods semTools Useful Tools for Structural Equation Modeling Provides useful tools for structural equation modeling.
2729 Psychometric Models and Methods semtree Recursive Partitioning for Structural Equation Models SEM Trees and SEM Forests an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees.
2730 Psychometric Models and Methods SensoMineR Sensory Data Analysis Statistical Methods to Analyse Sensory Data. SensoMineR: A package for sensory data analysis. S. Le and F. Husson (2008) <doi:10.1111/j.1745-459X.2007.00137.x>.
2731 Psychometric Models and Methods ShinyItemAnalysis Test and Item Analysis via Shiny Interactive shiny application for analysis of educational tests and their items.
2732 Psychometric Models and Methods Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
2733 Psychometric Models and Methods simcdm Simulate Cognitive Diagnostic Model (‘CDM’) Data Provides efficient R and ‘C++’ routines to simulate cognitive diagnostic model data for Deterministic Input, Noisy “And” Gate (‘DINA’) and reduced Reparameterized Unified Model (‘rRUM’) from Culpepper and Hudson (2017) <doi:10.1177/0146621617707511>, Culpepper (2015) <doi:10.3102/1076998615595403>, and de la Torre (2009) <doi:10.3102/1076998607309474>.
2734 Psychometric Models and Methods simsem SIMulated Structural Equation Modeling Provides an easy framework for Monte Carlo simulation in structural equation modeling, which can be used for various purposes, such as such as model fit evaluation, power analysis, or missing data handling and planning.
2735 Psychometric Models and Methods sirt Supplementary Item Response Theory Models Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).
2736 Psychometric Models and Methods smacof (core) Multidimensional Scaling Implements the following approaches for multidimensional scaling (MDS) based on stress minimization using majorization (smacof): ratio/interval/ordinal/spline MDS on symmetric dissimilarity matrices, MDS with external constraints on the configuration, individual differences scaling (idioscal, indscal), MDS with spherical restrictions, and ratio/interval/ordinal/spline unfolding (circular restrictions, row-conditional). Various tools and extensions like jackknife MDS, bootstrap MDS, permutation tests, MDS biplots, gravity models, inverse MDS, unidimensional scaling, drift vectors (asymmetric MDS), classical scaling, and Procrustes are implemented as well.
2737 Psychometric Models and Methods smds Symbolic Multidimensional Scaling Symbolic multidimensional scaling for interval-valued dissimilarities. The hypersphere model and the hyperbox model are available.
2738 Psychometric Models and Methods soc.ca Specific Correspondence Analysis for the Social Sciences Specific and class specific multiple correspondence analysis on survey-like data. Soc.ca is optimized to the needs of the social scientist and presents easily interpretable results in near publication ready quality.
2739 Psychometric Models and Methods SparseFactorAnalysis Scaling Count and Binary Data with Sparse Factor Analysis Multidimensional scaling provides a means of uncovering a latent structure underlying observed data, while estimating the number of latent dimensions. This package presents a means for scaling binary and count data, for example the votes and word counts for legislators. Future work will include an EM implementation and extend this work to ordinal and continuous data.
2740 Psychometric Models and Methods sparseSEM Sparse-aware Maximum Likelihood for Structural Equation Models Sparse-aware maximum likelihood for structural equation models in inferring gene regulatory networks
2741 Psychometric Models and Methods STARTS Functions for the STARTS Model Contains functions for estimating the STARTS model of Kenny and Zautra (1995, 2001) <doi:10.1037/0022-006X.63.1.52>, <doi:10.1037/10409-008>. Penalized maximum likelihood estimation and Markov Chain Monte Carlo estimation are also provided, see Luedtke, Robitzsch and Wagner (2018) <doi:10.1037/met0000155>.
2742 Psychometric Models and Methods subscore Computing Subscores in Classical Test Theory and Item Response Theory Functions for computing subscores for a test using different methods in both classical test theory (CTT) and item response theory (IRT). This package enables three sets of subscoring methods within the framework of CTT and IRT: Wainer’s augmentation method, Haberman’s three subscoring methods, and Yen’s objective performance index (OPI). The package also includes the function to compute Proportional Reduction of Mean Squared Errors (PRMSEs) in Haberman’s methods which are used to examine whether test subscores are of added value.
2743 Psychometric Models and Methods superMDS Implements the supervised multidimensional scaling (superMDS) proposal of Witten and Tibshirani (2011) Witten and Tibshirani (2011) Supervised multidimensional scaling for visualization, classification, and bipartite ranking. Computational Statistics and Data Analysis 55(1): 789-801.
2744 Psychometric Models and Methods systemfit Estimating Systems of Simultaneous Equations Econometric estimation of simultaneous systems of linear and nonlinear equations using Ordinary Least Squares (OLS), Weighted Least Squares (WLS), Seemingly Unrelated Regressions (SUR), Two-Stage Least Squares (2SLS), Weighted Two-Stage Least Squares (W2SLS), and Three-Stage Least Squares (3SLS).
2745 Psychometric Models and Methods TAM (core) Test Analysis Modules Includes marginal maximum likelihood estimation and joint maximum likelihood estimation for unidimensional and multidimensional item response models. The package functionality covers the Rasch model, 2PL model, 3PL model, generalized partial credit model, multi-faceted Rasch model, nominal item response model, structured latent class model, mixture distribution IRT models, and located latent class models. Latent regression models and plausible value imputation are also supported. For details see Adams, Wilson and Wang, 1997 <doi:10.1177/0146621697211001>, Adams, Wilson and Wu, 1997 <doi:10.3102/10769986022001047>, Formann, 1982 <doi:10.1002/bimj.4710240209>, Formann, 1992 <doi:10.1080/01621459.1992.10475229>.
2746 Psychometric Models and Methods TestDataImputation Missing Item Responses Imputation for Test and Assessment Data Functions for imputing missing item responses for dichotomous and polytomous test and assessment data. This package enables missing imputation methods that are suitable for test and assessment data, including: listwise (LW) deletion, treating as incorrect (IN), person mean imputation (PM), item mean imputation (IM), two-way imputation (TW), logistic regression imputation (LR), and EM imputation.
2747 Psychometric Models and Methods TestScorer GUI for Entering Test Items and Obtaining Raw and Transformed Scores GUI for entering test items and obtaining raw and transformed scores. The results are shown on the console and can be saved to a tabular text file for further statistical analysis. The user can define his own tests and scoring procedures through a GUI.
2748 Psychometric Models and Methods ThreeWay Three-Way Component Analysis Component analysis for three-way data arrays by means of Candecomp/Parafac, Tucker3, Tucker2 and Tucker1 models.
2749 Psychometric Models and Methods TreeBUGS Hierarchical Multinomial Processing Tree Modeling User-friendly analysis of hierarchical multinomial processing tree (MPT) models that are often used in cognitive psychology. Implements the latent-trait MPT approach (Klauer, 2010) <doi:10.1007/s11336-009-9141-0> and the beta-MPT approach (Smith & Batchelder, 2010) <doi:10.1016/j.jmp.2009.06.007> to model heterogeneity of participants. MPT models are conveniently specified by an .eqn-file as used by other MPT software and data are provided by a .csv-file or directly in R. Models are either fitted by calling JAGS or by an MPT-tailored Gibbs sampler in C++ (only for nonhierarchical and beta MPT models). Provides tests of heterogeneity and MPT-tailored summaries and plotting functions. A detailed documentation is available in Heck, Arnold, & Arnold (2018) <doi:10.3758/s13428-017-0869-7>.
2750 Psychometric Models and Methods TripleR Social Relation Model (SRM) Analyses for Single or Multiple Groups Social Relation Model (SRM) analyses for single or multiple round-robin groups are performed. These analyses are either based on one manifest variable, one latent construct measured by two manifest variables, two manifest variables and their bivariate relations, or two latent constructs each measured by two manifest variables. Within-group t-tests for variance components and covariances are provided for single groups. For multiple groups two types of significance tests are provided: between-groups t-tests (as in SOREMO) and enhanced standard errors based on Lashley and Bond (1997) <doi:10.1037/1082-989X.2.3.278>. Handling for missing values is provided.
2751 Psychometric Models and Methods vegan (core) Community Ecology Package Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
2752 Psychometric Models and Methods VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
2753 Psychometric Models and Methods wCorr Weighted Correlations Calculates Pearson, Spearman, polychoric, and polyserial correlation coefficients, in weighted or unweighted form. The package implements tetrachoric correlation as a special case of the polychoric and biserial correlation as a specific case of the polyserial.
2754 Psychometric Models and Methods WrightMap IRT Item-Person Map with ‘ConQuest’ Integration A powerful yet simple graphical tool available in the field of psychometrics is the Wright Map (also known as item maps or item-person maps), which presents the location of both respondents and items on the same scale. Wright Maps are commonly used to present the results of dichotomous or polytomous item response models. The ‘WrightMap’ package provides functions to create these plots from item parameters and person estimates stored as R objects. Although the package can be used in conjunction with any software used to estimate the IRT model (e.g. ‘TAM’, ‘mirt’, ‘eRm’ or ‘IRToys’ in ‘R’, or ‘Stata’, ‘Mplus’, etc.), ‘WrightMap’ features special integration with ‘ConQuest’ to facilitate reading and plotting its output directly.The ‘wrightMap’ function creates Wright Maps based on person estimates and item parameters produced by an item response analysis. The ‘CQmodel’ function reads output files created using ‘ConQuest’ software and creates a set of data frames for easy data manipulation, bundled in a ‘CQmodel’ object. The ‘wrightMap’ function can take a ‘CQmodel’ object as input or it can be used to create Wright Maps directly from data frames of person and item parameters.
2755 Psychometric Models and Methods xgobi Interface to the XGobi and XGvis programs for graphical data analysis Interface to the XGobi and XGvis programs for graphical data analysis.
2756 Psychometric Models and Methods xxIRT Item Response Theory and Computer-Based Testing in R A suite of psychometric analysis tools for research and operation, including: (1) computation of probability, information, and likelihood for the 3PL, GPCM, and GRM; (2) parameter estimation using joint or marginal likelihood estimation method; (3) simulation of computerized adaptive testing using built-in or customized algorithms; (4) assembly and simulation of multistage testing. The full documentation and tutorials are at <https://github.com/xluo11/xxIRT>.
2757 Reproducible Research animation A Gallery of Animations in Statistics and Utilities to Create Animations Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, non-parametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, ‘GIF’, HTML pages, ‘PDF’ and videos. ‘PDF’ animations can be inserted into ‘Sweave’ / ‘knitr’ easily.
2758 Reproducible Research apsrtable apsrtable model-output formatter for social science Formats latex tables from one or more model objects side-by-side with standard errors below, not unlike tables found in such journals as the American Political Science Review.
2759 Reproducible Research archivist Tools for Storing, Restoring and Searching for R Objects Data exploration and modelling is a process in which a lot of data artifacts are produced. Artifacts like: subsets, data aggregates, plots, statistical models, different versions of data sets and different versions of results. The more projects we work with the more artifacts are produced and the harder it is to manage these artifacts. Archivist helps to store and manage artifacts created in R. Archivist allows you to store selected artifacts as a binary files together with their metadata and relations. Archivist allows to share artifacts with others, either through shared folder or github. Archivist allows to look for already created artifacts by using it’s class, name, date of the creation or other properties. Makes it easy to restore such artifacts. Archivist allows to check if new artifact is the exact copy that was produced some time ago. That might be useful either for testing or caching.
2760 Reproducible Research bibtex Bibtex Parser Utility to parse a bibtex file.
2761 Reproducible Research brew Templating Framework for Report Generation brew implements a templating framework for mixing text and R code for report generation. brew template syntax is similar to PHP, Ruby’s erb module, Java Server Pages, and Python’s psp module.
2762 Reproducible Research checkpoint Install Packages from Snapshots on the Checkpoint Server for Reproducibility The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine. To achieve reproducibility, the checkpoint() function installs the packages required or called by your project and scripts to a local library exactly as they existed at the specified point in time. Only those packages are available to your project, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint’s checkpoint() can ensure the reproducibility of your scripts or projects at any time. To create the snapshot archives, once a day (at midnight UTC) Microsoft refreshes the Austria CRAN mirror on the “Microsoft R Archived Network” server (<https://mran.microsoft.com/>). Immediately after completion of the rsync mirror process, the process takes a snapshot, thus creating the archive. Snapshot archives exist starting from 2014-09-17.
2763 Reproducible Research DT A Wrapper of the JavaScript Library ‘DataTables’ Data objects in R can be rendered as HTML tables using the JavaScript library ‘DataTables’ (typically via R Markdown or Shiny). The ‘DataTables’ library has been included in this R package. The package name ‘DT’ is an abbreviation of ‘DataTables’.
2764 Reproducible Research exams Automatic Generation of Exams in R Automatic generation of exams based on exercises in Markdown or LaTeX format, possibly including R code for dynamic generation of exercise elements. Exercise types include single-choice and multiple-choice questions, arithmetic problems, string questions, and combinations thereof (cloze). Output formats include standalone files (PDF, HTML, Docx, ODT, …), Moodle XML, QTI 1.2 (for OLAT/OpenOLAT), QTI 2.1, Blackboard, ARSnova, and TCExam. In addition to fully customizable PDF exams, a standardized PDF format (NOPS) is provided that can be printed, scanned, and automatically evaluated.
2765 Reproducible Research formatR Format R Code Automatically Provides a function tidy_source() to format R source code. Spaces and indent will be added to the code automatically, and comments will be preserved under certain conditions, so that R code will be more human-readable and tidy. There is also a Shiny app as a user interface in this package (see tidy_app()).
2766 Reproducible Research formattable Create ‘Formattable’ Data Structures Provides functions to create formattable vectors and data frames. ‘Formattable’ vectors are printed with text formatting, and formattable data frames are printed with multiple types of formatting in HTML to improve the readability of data presented in tabular form rendered in web pages.
2767 Reproducible Research highlight Syntax Highlighter Syntax highlighter for R code based on the results of the R parser. Rendering in HTML and latex markup. Custom Sweave driver performing syntax highlighting of R code chunks.
2768 Reproducible Research highr Syntax Highlighting for R Source Code Provides syntax highlighting for R source code. Currently it supports LaTeX and HTML output. Source code of other languages is supported via Andre Simon’s highlight package (<http://www.andre-simon.de>).
2769 Reproducible Research Hmisc (core) Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
2770 Reproducible Research htmlTable Advanced Tables for Markdown/HTML Tables with state-of-the-art layout elements such as row spanners, column spanners, table spanners, zebra striping, and more. While allowing advanced layout, the underlying css-structure is simple in order to maximize compatibility with word processors such as ‘MS Word’ or ‘LibreOffice’. The package also contains a few text formatting functions that help outputting text compatible with HTML/LaTeX.
2771 Reproducible Research htmltools Tools for HTML Tools for HTML generation and output.
2772 Reproducible Research HTMLUtils Facilitates Automated HTML Report Creation Facilitates automated HTML report creation, in particular framed HTML pages and dynamically sortable tables.
2773 Reproducible Research humanFormat Human-friendly formatting functions Format quantities of time or bytes into human-friendly strings.
2774 Reproducible Research hwriter HTML Writer - Outputs R objects in HTML format Easy-to-use and versatile functions to output R objects in HTML format
2775 Reproducible Research kfigr Integrated Code Chunk Anchoring and Referencing for R Markdown Documents A streamlined cross-referencing system for R Markdown documents generated with ‘knitr’. R Markdown is an authoring format for generating dynamic content from R. ‘kfigr’ provides a hook for anchoring code chunks and a function to cross-reference document elements generated from said chunks, e.g. figures and tables.
2776 Reproducible Research knitcitations Citations for ‘Knitr’ Markdown Files Provides the ability to create dynamic citations in which the bibliographic information is pulled from the web rather than having to be entered into a local database such as ‘bibtex’ ahead of time. The package is primarily aimed at authoring in the R ‘markdown’ format, and can provide outputs for web-based authoring such as linked text for inline citations. Cite using a ‘DOI’, URL, or ‘bibtex’ file key. See the package URL for details.
2777 Reproducible Research knitLatex ‘Knitr’ Helpers - Mostly Tables Provides several helper functions for working with ‘knitr’ and ‘LaTeX’. It includes ‘xTab’ for creating traditional ‘LaTeX’ tables, ‘lTab’ for generating ‘longtable’ environments, and ‘sTab’ for generating a ‘supertabular’ environment. Additionally, this package contains a knitr_setup() function which fixes a well-known bug in ‘knitr’, which distorts the ‘results=“asis”’ command when used in conjunction with user-defined commands; and a com command (<<com=TRUE>>=) which renders the output from ‘knitr’ as a ‘LaTeX’ command.
2778 Reproducible Research knitr (core) A General-Purpose Package for Dynamic Report Generation in R Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
2779 Reproducible Research latex2exp Use LaTeX Expressions in Plots Parses and converts LaTeX math formulas to R’s plotmath expressions, used to enter mathematical formulas and symbols to be rendered as text, axis labels, etc. throughout R’s plotting system.
2780 Reproducible Research lazyWeave LaTeX Wrappers for R Users Provides the functionality to write LaTeX code from within R without having to learn LaTeX. Functionality also exists to create HTML and Markdown code. While the functionality still exists to write complete documents with lazyWeave, it is generally easier to do so with with markdown and knitr. lazyWeave’s main strength now is the ability to design custom and complex tables for reporting results.
2781 Reproducible Research lubridate Make Dealing with Dates a Little Easier Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The ‘lubridate’ package has a consistent and memorable syntax that makes working with dates easy and fun. Parts of the ‘CCTZ’ source code, released under the Apache 2.0 License, are included in this package. See <https://github.com/google/cctz> for more details.
2782 Reproducible Research markdown ‘Markdown’ Rendering for R Provides R bindings to the ‘Sundown’ ‘Markdown’ rendering library (<https://github.com/vmg/sundown>). ‘Markdown’ is a plain-text formatting syntax that can be converted to ‘XHTML’ or other formats. See <http://en.wikipedia.org/wiki/Markdown> for more information about ‘Markdown’.
2783 Reproducible Research memisc Management of Survey Data and Presentation of Analysis Results An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) ‘SPSS’ and ‘Stata’ files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to ‘LaTeX’ and HTML.
2784 Reproducible Research miniCRAN Create a Mini Version of CRAN Containing Only Selected Packages Makes it possible to create an internally consistent repository consisting of selected packages from CRAN-like repositories. The user specifies a set of desired packages, and ‘miniCRAN’ recursively reads the dependency tree for these packages, then downloads only this subset. The user can then install packages from this repository directly, rather than from CRAN. This is useful in production settings, e.g. server behind a firewall, or remote locations with slow (or zero) Internet access.
2785 Reproducible Research NMOF Numerical Methods and Optimization in Finance Functions, examples and data from the book “Numerical Methods and Optimization in Finance” by M. Gilli, D. Maringer and E. Schumann (2011), ISBN 978-0123756626. The package provides implementations of several optimisation heuristics, such as Differential Evolution, Genetic Algorithms and Threshold Accepting. There are also functions for the valuation of financial instruments, such as bonds and options, and functions that help with stochastic simulations.
2786 Reproducible Research packrat A Dependency Management System for Projects and their R Package Dependencies Manage the R packages your project depends on in an isolated, portable, and reproducible way.
2787 Reproducible Research pander An R ‘Pandoc’ Writer Contains some functions catching all messages, ‘stdout’ and other useful information while evaluating R code and other helpers to return user specified text elements (like: header, paragraph, table, image, lists etc.) in ‘pandoc’ markdown or several type of R objects similarly automatically transformed to markdown format. Also capable of exporting/converting (the resulting) complex ‘pandoc’ documents to e.g. HTML, ‘PDF’, ‘docx’ or ‘odt’. This latter reporting feature is supported in brew syntax or with a custom reference class with a smarty caching ‘backend’.
2788 Reproducible Research papeR A Toolbox for Writing Pretty Papers and Reports A toolbox for writing ‘knitr’, ‘Sweave’ or other ‘LaTeX’- or ‘markdown’-based reports and to prettify the output of various estimated models.
2789 Reproducible Research prettyunits Pretty, Human Readable Formatting of Quantities Pretty, human readable formatting of quantities. Time intervals: 1337000 -> 15d 11h 23m 20s. Vague time intervals: 2674000 -> about a month ago. Bytes: 1337 -> 1.34 kB.
2790 Reproducible Research quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
2791 Reproducible Research R.cache Fast and Light-Weight Caching (Memoization) of Objects and Results to Speed Up Computations Memoization can be used to speed up repetitive and computational expensive function calls. The first time a function that implements memoization is called the results are stored in a cache memory. The next time the function is called with the same set of parameters, the results are momentarily retrieved from the cache avoiding repeating the calculations. With this package, any R object can be cached in a key-value storage where the key can be an arbitrary set of R objects. The cache memory is persistent (on the file system).
2792 Reproducible Research R.rsp Dynamic Generation of Scientific Reports The RSP markup language makes any text-based document come alive. RSP provides a powerful markup for controlling the content and output of LaTeX, HTML, Markdown, AsciiDoc, Sweave and knitr documents (and more), e.g. ‘Today’s date is <%=Sys.Date()%>’. Contrary to many other literate programming languages, with RSP it is straightforward to loop over mixtures of code and text sections, e.g. in month-by-month summaries. RSP has also several preprocessing directives for incorporating static and dynamic contents of external files (local or online) among other things. Functions rstring() and rcat() make it easy to process RSP strings, rsource() sources an RSP file as it was an R script, while rfile() compiles it (even online) into its final output format, e.g. rfile(‘report.tex.rsp’) generates ‘report.pdf’ and rfile(‘report.md.rsp’) generates ‘report.html’. RSP is ideal for self-contained scientific reports and R package vignettes. It’s easy to use - if you know how to write an R script, you’ll be up and running within minutes.
2793 Reproducible Research R2HTML (core) HTML Exportation for R Objects Includes HTML function and methods to write in an HTML file. Thus, making HTML reports is easy. Includes a function that allows redirection on the fly, which appears to be very useful for teaching purpose, as the student can keep a copy of the produced output to keep all that he did during the course. Package comes with a vignette describing how to write HTML reports for statistical analysis. Finally, a driver for ‘Sweave’ allows to parse HTML flat files containing R code and to automatically write the corresponding outputs (tables and graphs).
2794 Reproducible Research R2PPT Simple R Interface to Microsoft PowerPoint using rcom or RDCOMClient R2PPT provides a simple set of wrappers to easily use rcom or RDCOMClient for generating Microsoft PowerPoint presentations.
2795 Reproducible Research R2wd Write MS-Word documents from R This package uses either the statconnDCOM server (via the rcom package) or the RDCOMClient to communicate with MS-Word via the COM interface.
2796 Reproducible Research rapport A Report Templating System Facilitating the creation of reproducible statistical report templates. Once created, rapport templates can be exported to various external formats (HTML, LaTeX, PDF, ODT etc.) with pandoc as the converter backend.
2797 Reproducible Research rbundler Rbundler manages an application’s dependencies systematically and repeatedly Rbundler manages a project-specific library for dependency package installation. By specifying dependencies in a DESCRIPTION file in a project’s root directory, one may install and use dependencies in a repeatable fashion without requiring manual maintenance. rbundler creates a project-specific R library in ‘PROJECT_ROOT/.Rbundle’ (by default) and a project-specific ‘R_LIBS_USER’ value, set in ‘PROJECT_ROOT/.Renviron’. It supports dependency management for R standard “Depends”, “Imports”, “Suggests”, and “LinkingTo” package dependencies. rbundler also attempts to validate and install versioned dependencies, such as “>=”, “==”, “<=”. Note that, due to the way R manages package installation, differing nested versioned dependencies are not allowed. For example, if your project depends on packages A (== 1), and B (== 2), but package A depends on B (== 1), then a nested dependency violation will cause rbundler to error out.
2798 Reproducible Research RefManageR Straightforward ‘BibTeX’ and ‘BibLaTeX’ Bibliography Management Provides tools for importing and working with bibliographic references. It greatly enhances the ‘bibentry’ class by providing a class ‘BibEntry’ which stores ‘BibTeX’ and ‘BibLaTeX’ references, supports ‘UTF-8’ encoding, and can be easily searched by any field, by date ranges, and by various formats for name lists (author by last names, translator by full names, etc.). Entries can be updated, combined, sorted, printed in a number of styles, and exported. ‘BibTeX’ and ‘BibLaTeX’ ‘.bib’ files can be read into ‘R’ and converted to ‘BibEntry’ objects. Interfaces to ‘NCBI Entrez’, ‘CrossRef’, and ‘Zotero’ are provided for importing references and references can be created from locally stored ‘PDF’ files using ‘Poppler’. Includes functions for citing and generating a bibliography with hyperlinks for documents prepared with ‘RMarkdown’ or ‘RHTML’.
2799 Reproducible Research reporttools Generate LaTeX Tables of Descriptive Statistics These functions are especially helpful when writing reports of data analysis using Sweave.
2800 Reproducible Research resumer Build Resumes with R Using a database, LaTeX and R easily build attractive resumes.
2801 Reproducible Research rmarkdown Dynamic Documents for R Convert R Markdown documents into a variety of formats.
2802 Reproducible Research rms (core) Regression Modeling Strategies Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. ‘rms’ is a collection of functions that assist with and streamline modeling. It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the Buckley-James multiple regression model for right-censored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. ‘rms’ works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.
2803 Reproducible Research rprintf Adaptive Builder for Formatted Strings Provides a set of functions to facilitate building formatted strings under various replacement rules: C-style formatting, variable-based formatting, and number-based formatting. C-style formatting is basically identical to built-in function ‘sprintf’. Variable-based formatting allows users to put variable names in a formatted string which will be replaced by variable values. Number-based formatting allows users to use index numbers to represent the corresponding argument value to appear in the string.
2804 Reproducible Research rtf Rich Text Format (RTF) Output A set of R functions to output Rich Text Format (RTF) files with high resolution tables and graphics that may be edited with a standard word processor such as Microsoft Word.
2805 Reproducible Research SortableHTMLTables Turns a data frame into an HTML file containing a sortable table SortableHTMLTables writes a data frame to an HTML file that contains a sortable table. The sorting is done using the jQuery plugin Tablesorter. The appearance is controlled through a CSS file and several GIF’s.
2806 Reproducible Research sparktex Generate LaTeX sparklines in R Generate syntax for use with the sparklines package for LaTeX.
2807 Reproducible Research stargazer Well-Formatted Regression and Summary Statistics Tables Produces LaTeX code, HTML/CSS code and ASCII text for well-formatted tables that hold regression analysis results from several models side-by-side, as well as summary statistics.
2808 Reproducible Research suRtex LaTeX descriptive statistic reporting for survey data suRtex was designed for easy descriptive statistic reporting of categorical survey data (e.g., Likert scales) in LaTeX. suRtex takes a matrix or data frame and produces the LaTeX code necessary for a sideways table creation. Mean, median, standard deviation, and sample size are optional.
2809 Reproducible Research tables Formula-Driven Table Generation Computes and displays complex tables of summary statistics. Output may be in LaTeX, HTML, plain text, or an R matrix for further processing.
2810 Reproducible Research texreg Conversion of R Regression Output to LaTeX or HTML Tables Converts coefficients, standard errors, significance stars, and goodness-of-fit statistics of statistical models into LaTeX tables or HTML tables/MS Word documents or to nicely formatted screen output for the R console for easy model comparison. A list of several models can be combined in a single table. The output is highly customizable. New model types can be easily implemented.
2811 Reproducible Research tikzDevice R Graphics Output in LaTeX Format Provides a graphics output device for R that records plots in a LaTeX-friendly format. The device transforms plotting commands issued by R functions into LaTeX code blocks. When included in a LaTeX document, these blocks are interpreted with the help of ‘TikZ’―a graphics package for TeX and friends written by Till Tantau. Using the ‘tikzDevice’, the text of R plots can contain LaTeX commands such as mathematical formula. The device also allows arbitrary LaTeX code to be inserted into the output stream.
2812 Reproducible Research tth TeX to HTML/MathML Translators tth/ttm C source code and R wrappers for the tth/ttm TeX to HTML/MathML translators.
2813 Reproducible Research tufterhandout Tufte-style html document format for rmarkdown Custom template and output formats for use with rmarkdown. Produce Edward Tufte-style handouts in html formats with full support for rmarkdown features
2814 Reproducible Research xtable (core) Export Tables to LaTeX or HTML Coerce data to LaTeX and HTML tables.
2815 Reproducible Research ztable Zebra-Striped Tables in LaTeX and HTML Formats Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm, coxph, nls, fitdistr, mytable and cbind.mytable objects.
2816 Robust Statistical Methods cluster “Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) “Finding Groups in Data”.
2817 Robust Statistical Methods complmrob Robust Linear Regression with Compositional Data as Covariates Robust regression methods for compositional data. The distribution of the estimates can be approximated with various bootstrap methods. These bootstrap methods are available for the compositional as well as for standard robust regression estimates. This allows for direct comparison between them.
2818 Robust Statistical Methods covRobust Robust Covariance Estimation via Nearest Neighbor Cleaning The cov.nnve() function implements robust covariance estimation by the nearest neighbor variance estimation (NNVE) method of Wang and Raftery (2002) <doi:10.1198/016214502388618780>.
2819 Robust Statistical Methods coxrobust Robust Estimation in Cox Model Fit robustly proportional hazards regression model
2820 Robust Statistical Methods distr Object Oriented Implementation of Distributions S4-classes and methods for distributions.
2821 Robust Statistical Methods drgee Doubly Robust Generalized Estimating Equations Fit restricted mean models for the conditional association between an exposure and an outcome, given covariates. Three methods are implemented: O-estimation, where a nuisance model for the association between the covariates and the outcome is used; E-estimation where a nuisance model for the association between the covariates and the exposure is used, and doubly robust (DR) estimation where both nuisance models are used. In DR-estimation, the estimates will be consistent when at least one of the nuisance models is correctly specified, not necessarily both.
2822 Robust Statistical Methods genie A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm A new hierarchical clustering linkage criterion: the Genie algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed, see (Gagolewski et al. 2016a <doi:10.1016/j.ins.2016.05.003>, 2016b <doi:10.1007/978-3-319-45656-0_16>) for more details.
2823 Robust Statistical Methods georob Robust Geostatistical Analysis of Spatial Data Provides functions for efficiently fitting linear models with spatially correlated errors by robust and Gaussian (Restricted) Maximum Likelihood and for computing robust and customary point and block external-drift Kriging predictions, along with utility functions for variogram modelling in ad hoc geostatistical analyses, model building, model evaluation by cross-validation, (conditional) simulation of Gaussian processes, unbiased back-transformation of Kriging predictions of log-transformed data.
2824 Robust Statistical Methods Gmedian Geometric Median, k-Median Clustering and Robust Median PCA Fast algorithms for robust estimation with large samples of multivariate observations. Estimation of the geometric median, robust k-Gmedian clustering, and robust PCA based on the Gmedian covariation matrix.
2825 Robust Statistical Methods GSE Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data Robust Estimation of Multivariate Location and Scatter in the Presence of Cellwise and Casewise Contamination and Missing Data.
2826 Robust Statistical Methods lqmm Linear Quantile Mixed Models Functions to fit quantile regression models for hierarchical data (2-level nested designs) as described in Geraci and Bottai (2014, Statistics and Computing) <doi:10.1007/s11222-013-9381-9>. A vignette is given in Geraci (2014, Journal of Statistical Software) <doi:10.18637/jss.v057.i13> and included in the package documents. The packages also provides functions to fit quantile models for independent data and for count responses.
2827 Robust Statistical Methods MASS (core) Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
2828 Robust Statistical Methods mblm Median-Based Linear Models Provides linear models based on Theil-Sen single median and Siegel repeated medians. They are very robust (29 or 50 percent breakdown point, respectively), and if no outliers are present, the estimators are very similar to OLS.
2829 Robust Statistical Methods metaplus Robust Meta-Analysis and Meta-Regression Performs meta-analysis and meta-regression using standard and robust methods with confidence intervals based on the profile likelihood. Robust methods are based on alternative distributions for the random effect, either the t-distribution (Lee and Thompson, 2008 <doi:10.1002/sim.2897> or Baker and Jackson, 2008 <doi:10.1007/s10729-007-9041-8>) or mixtures of normals (Beath, 2014 <doi:10.1002/jrsm.1114>).
2830 Robust Statistical Methods multinomRob Robust Estimation of Overdispersed Multinomial Regression Models MNL and overdispersed multinomial regression using robust (LQD and tanh) estimation
2831 Robust Statistical Methods mvoutlier Multivariate Outlier Detection Based on Robust Methods Various Methods for Multivariate Outlier Detection.
2832 Robust Statistical Methods otrimle Robust Model-Based Clustering Performs robust cluster analysis allowing for outliers and noise that cannot be fitted by any cluster. The data are modelled by a mixture of Gaussian distributions and a noise component, which is an improper uniform distribution covering the whole Euclidean space. Parameters are estimated by (pseudo) maximum likelihood. This is fitted by a EM-type algorithm. See Coretto and Hennig (2016) <doi:10.1080/01621459.2015.1100996>, and Coretto and Hennig (2017) <arXiv:1309.6895>.
2833 Robust Statistical Methods OutlierDC Outlier Detection using quantile regression for Censored Data This package provides three algorithms to detect outlying observations for censored survival data.
2834 Robust Statistical Methods OutlierDM Outlier Detection for Multi-replicated High-throughput Data Detecting outlying values such as genes, peptides or samples for multi-replicated high-throughput high-dimensional data
2835 Robust Statistical Methods pcaPP Robust PCA by Projection Pursuit Provides functions for robust PCA by projection pursuit. The methods are described in Croux et al. (2006) <doi:10.2139/ssrn.968376>, Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/978-3-642-33042-1_31>.
2836 Robust Statistical Methods qclust Robust Estimation of Gaussian Mixture Models Robust estimation of Gaussian mixture models fitted by modified EM algorithm, robust clustering and classification.
2837 Robust Statistical Methods quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
2838 Robust Statistical Methods RandVar Implementation of Random Variables Implements random variables by means of S4 classes and methods.
2839 Robust Statistical Methods rgam Robust Generalized Additive Model An implementation of algorithms for outlier-robust fit for Generalized Additive Models described in Azadeh and Salibian-Barrera (2012) <doi:10.1198/jasa.2011.tm09654>.
2840 Robust Statistical Methods roahd Robust Analysis of High Dimensional Data A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in high-dimensional cases, and hence with attention to computational efficiency and simplicity of use.
2841 Robust Statistical Methods RobAStBase Robust Asymptotic Statistics Base S4-classes and functions for robust asymptotic statistics.
2842 Robust Statistical Methods robcor Robust Correlations Robust pairwise correlations based on estimates of scale, particularly on “FastQn” one-step M-estimate.
2843 Robust Statistical Methods robeth R Functions for Robust Statistics Locations problems, M-estimates of coefficients and scale in linear regression, Weights for bounded influence regression, Covariance matrix of the coefficient estimates, Asymptotic relative efficiency of regression M-estimates, Robust testing in linear models, High breakdown point regression, M-estimates of covariance matrices, M-estimates for discrete generalized linear models.
2844 Robust Statistical Methods robfilter Robust Time Series Filters A set of functions to filter time series based on concepts from robust statistics.
2845 Robust Statistical Methods RobLox Optimally Robust Influence Curves and Estimators for Location and Scale Functions for the determination of optimally robust influence curves and estimators in case of normal location and/or scale.
2846 Robust Statistical Methods RobLoxBioC Infinitesimally Robust Estimators for Preprocessing -Omics Data Functions for the determination of optimally robust influence curves and estimators for preprocessing omics data, in particular gene expression data.
2847 Robust Statistical Methods RobPer Robust Periodogram and Periodicity Detection Methods Calculates periodograms based on (robustly) fitting periodic functions to light curves (irregularly observed time series, possibly with measurement accuracies, occurring in astroparticle physics). Three main functions are included: RobPer() calculates the periodogram. Outlying periodogram bars (indicating a period) can be detected with betaCvMfit(). Artificial light curves can be generated using the function tsgen(). For more details see the corresponding article: Thieler, Fried and Rathjens (2016), Journal of Statistical Software 69(9), 1-36, <doi:10.18637/jss.v069.i09>.
2848 Robust Statistical Methods RobRex Optimally Robust Influence Curves for Regression and Scale Functions for the determination of optimally robust influence curves in case of linear regression with unknown scale and standard normal distributed errors where the regressor is random.
2849 Robust Statistical Methods RobRSVD Robust Regularized Singular Value Decomposition This package provides the function to calculate SVD, regularized SVD, robust SVD and robust regularized SVD method. The robust SVD methods use alternating iteratively reweighted least squares methods. The regularized SVD uses generalized cross validation to choose the optimal smoothing parameters.
2850 Robust Statistical Methods robumeta Robust Variance Meta-Regression Functions for conducting robust variance estimation (RVE) meta-regression using both large and small sample RVE estimators under various weighting schemes. These methods are distribution free and provide valid point estimates, standard errors and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown. Also included are functions for conducting sensitivity analyses under correlated effects weighting and producing RVE-based forest plots.
2851 Robust Statistical Methods robust (core) Port of the S+ “Robust Library” Methods for robust statistics, a state of the art in the early 2000s, notably for robust regression and robust multivariate analysis.
2852 Robust Statistical Methods RobustAFT Truncated Maximum Likelihood Fit and Robust Accelerated Failure Time Regression for Gaussian and Log-Weibull Case R functions for the computation of the truncated maximum likelihood and the robust accelerated failure time regression for gaussian and log-Weibull case.
2853 Robust Statistical Methods robustbase (core) Basic Robust Statistics “Essential” Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book “Robust Statistics, Theory and Methods” by ‘Maronna, Martin and Yohai’; Wiley 2006.
2854 Robust Statistical Methods robustDA Robust Mixture Discriminant Analysis Robust mixture discriminant analysis (RMDA, Bouveyron & Girard, 2009) allows to build a robust supervised classifier from learning data with label noise. The idea of the proposed method is to confront an unsupervised modeling of the data with the supervised information carried by the labels of the learning data in order to detect inconsistencies. The method is able afterward to build a robust classifier taking into account the detected inconsistencies into the labels.
2855 Robust Statistical Methods robustgam Robust Estimation for Generalized Additive Models This package provides robust estimation for generalized additive models. It implements a fast and stable algorithm in Wong, Yao and Lee (2013). The implementation also contains three automatic selection methods for smoothing parameter. They are designed to be robust to outliers. For more details, see Wong, Yao and Lee (2013).
2856 Robust Statistical Methods robustlmm Robust Linear Mixed Effects Models A method to fit linear mixed effects models robustly. Robustness is achieved by modification of the scoring equations combined with the Design Adaptive Scale approach.
2857 Robust Statistical Methods robustloggamma Robust Estimation of the Generalized log Gamma Model Robust estimation of the generalized log gamma model is provided using Quantile Tau estimator, Weighted Likelihood estimator and Truncated Maximum Likelihood estimator. Functions for regression and censored data are also available.
2858 Robust Statistical Methods robustreg Robust Regression Functions Linear regression functions using Huber and bisquare psi functions. Optimal weights are calculated using IRLS algorithm.
2859 Robust Statistical Methods robustX ‘eXtra’ / ‘eXperimental’ Functionality for Robust Statistics Robustness ‘eXperimental’, ‘eXtraneous’, or ‘eXtraordinary’ Functionality for Robust Statistics. In other words, methods which are not yet well established, often related to methods in package ‘robustbase’.
2860 Robust Statistical Methods ROptEst Optimally Robust Estimation Optimally robust estimation in general smoothly parameterized models using S4 classes and methods.
2861 Robust Statistical Methods ROptRegTS Optimally Robust Estimation for Regression-Type Models Optimally robust estimation for regression-type models using S4 classes and methods.
2862 Robust Statistical Methods rorutadis Robust Ordinal Regression UTADIS Implementation of Robust Ordinal Regression for multiple criteria value-based sorting with preference information provided in form of possibly imprecise assignment examples, assignment-based pairwise comparisons, and desired class cardinalities [Kadzinski et al. 2015, <doi:10.1016/j.ejor.2014.09.050>].
2863 Robust Statistical Methods rospca Robust Sparse PCA using the ROSPCA Algorithm Implementation of robust sparse PCA using the ROSPCA algorithm of Hubert et al. (2016) <doi:10.1080/00401706.2015.1093962>.
2864 Robust Statistical Methods rpca RobustPCA: Decompose a Matrix into Low-Rank and Sparse Components Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Candes, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis?. Journal of the ACM (JACM), 58(3), 11. prove that we can recover each component individually under some suitable assumptions. It is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This package implements this decomposition algorithm resulting with Robust PCA approach.
2865 Robust Statistical Methods rrcov (core) Scalable Robust Estimators with High Breakdown Point Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point.
2866 Robust Statistical Methods rrcovHD Robust Multivariate Methods for High Dimensional Data Robust multivariate methods for high dimensional data including outlier detection, PCA, PLS and classification.
2867 Robust Statistical Methods rrcovNA Scalable Robust Estimators with High Breakdown Point for Incomplete Data Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point for Incomplete Data.
2868 Robust Statistical Methods RSKC Robust Sparse K-Means This RSKC package contains a function RSKC which runs the robust sparse K-means clustering algorithm.
2869 Robust Statistical Methods sandwich Robust Covariance Matrix Estimators Model-robust standard error estimators for cross-sectional, time series, clustered, panel, and longitudinal data.
2870 Robust Statistical Methods snipEM Snipping Methods for Robust Estimation and Clustering Snipping methods optimally removing scattered cells for robust estimation and cluster analysis.
2871 Robust Statistical Methods ssmrob Robust Estimation and Inference in Sample Selection Models Package provides a set of tools for robust estimation and inference for models with sample selectivity.
2872 Robust Statistical Methods tclust Robust Trimmed Clustering Provides functions for robust trimmed clustering. The methods are described in Garcia-Escudero (2008) <doi:10.1214/07-AOS515>, Fritz et al. (2012) <doi:10.18637/jss.v047.i12> and others.
2873 Robust Statistical Methods TEEReg Trimmed Elemental Estimation for Linear Models For fitting multiple linear regressions, the ordinary least squares approach is sensitive to outliers and/or violations of model assumptions. The trimmed elemental estimators are more robust to such situations. This package contains functions for computing the trimmed elemental estimates, as well as for creating the bias-corrected and accelerated bootstrap confidence intervals based on elemental regressions.
2874 Robust Statistical Methods walrus Robust Statistical Methods A toolbox of common robust statistical tests, including robust descriptives, robust t-tests, and robust ANOVA. It is also available as a module for ‘jamovi’ (see <https://www.jamovi.org> for more information). Walrus is based on the WRS2 package by Patrick Mair, which is in turn based on the scripts and work of Rand Wilcox. These analyses are described in depth in the book ‘Introduction to Robust Estimation & Hypothesis Testing’.
2875 Robust Statistical Methods WRS2 A Collection of Robust Statistical Methods A collection of robust statistical methods based on Wilcox’ WRS functions. It implements robust t-tests (independent and dependent samples), robust ANOVA (including between-within subject designs), quantile ANOVA, robust correlation, robust mediation, and nonparametric ANCOVA models based on robust location measures.
2876 Statistics for the Social Sciences acepack ACE and AVAS for Selecting Multiple Regression Transformations Two nonparametric methods for multiple regression transform selection are provided. The first, Alternative Conditional Expectations (ACE), is an algorithm to find the fixed point of maximal correlation, i.e. it finds a set of transformed response variables that maximizes R^2 using smoothing functions [see Breiman, L., and J.H. Friedman. 1985. “Estimating Optimal Transformations for Multiple Regression and Correlation”. Journal of the American Statistical Association. 80:580-598. <doi:10.1080/01621459.1985.10478157>]. Also included is the Additivity Variance Stabilization (AVAS) method which works better than ACE when correlation is low [see Tibshirani, R.. 1986. “Estimating Transformations for Regression via Additivity and Variance Stabilization”. Journal of the American Statistical Association. 83:394-405. <doi:10.1080/01621459.1988.10478610>]. A good introduction to these two methods is in chapter 16 of Frank Harrel’s “Regression Modeling Strategies” in the Springer Series in Statistics.
2877 Statistics for the Social Sciences Amelia A Program for Missing Data A tool that “multiply imputes” missing data in a single cross-section (such as a survey), from a time series (like variables collected for each year in a country), or from a time-series-cross-sectional data set (such as collected by years for each of several countries). Amelia II implements our bootstrapping-based algorithm that gives essentially the same answers as the standard IP or EMis approaches, is usually considerably faster than existing approaches and can handle many more variables. Unlike Amelia I and other statistically rigorous imputation software, it virtually never crashes (but please let us know if you find to the contrary!). The program also generalizes existing approaches by allowing for trends in time series across observations within a cross-sectional unit, as well as priors that allow experts to incorporate beliefs they have about the values of missing cells in their data. Amelia II also includes useful diagnostics of the fit of multiple imputation models. The program works from the R command line or via a graphical user interface that does not require users to know R.
2878 Statistics for the Social Sciences aod Analysis of Overdispersed Data Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).
2879 Statistics for the Social Sciences arm Data Analysis Using Regression and Multilevel/Hierarchical Models Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.
2880 Statistics for the Social Sciences betareg Beta Regression Beta regression for modeling beta-distributed dependent variables, e.g., rates and proportions. In addition to maximum likelihood regression (for both mean and precision of a beta-distributed response), bias-corrected and bias-reduced estimation as well as finite mixture models and recursive partitioning for beta regressions are provided.
2881 Statistics for the Social Sciences biglm bounded memory linear and generalized linear models Regression for data too large to fit in memory
2882 Statistics for the Social Sciences BMA Bayesian Model Averaging Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).
2883 Statistics for the Social Sciences boot (core) Bootstrap Functions (Originally by Angelo Canty for S) Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.
2884 Statistics for the Social Sciences bootstrap Functions for the Book “An Introduction to the Bootstrap” Software (bootstrap, cross-validation, jackknife) and data for the book “An Introduction to the Bootstrap” by B. Efron and R. Tibshirani, 1993, Chapman and Hall. This package is primarily provided for projects already based on it, and for support of the book. New projects should preferentially use the recommended package “boot”.
2885 Statistics for the Social Sciences brglm Bias Reduction in Binomial-Response Generalized Linear Models Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as ‘glm’. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.
2886 Statistics for the Social Sciences car (core) Companion to Applied Regression Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.
2887 Statistics for the Social Sciences catspec Special models for categorical variables ‘ctab’ creates (multiway) percentage tables. ‘sqtab’ contains a set of functions for estimating models for square tables such as quasi-independence, symmetry, uniform association. Examples show how to use these models in a loglinear model using glm or in a multinomial logistic model using mlogit or clogit
2888 Statistics for the Social Sciences class Functions for Classification Various functions for classification, including k-nearest neighbour, Learning Vector Quantization and Self-Organizing Maps.
2889 Statistics for the Social Sciences demography Forecasting Mortality, Fertility, Migration and Population Data Functions for demographic analysis including lifetable calculations; Lee-Carter modelling; functional data analysis of mortality rates, fertility rates, net migration numbers; and stochastic population forecasting.
2890 Statistics for the Social Sciences dispmod Modelling Dispersion in GLM Functions for estimating Gaussian dispersion regression models (Aitkin, 1987 <doi:10.2307/2347792>), overdispersed binomial logit models (Williams, 1987 <doi:10.2307/2347977>), and overdispersed Poisson log-linear models (Breslow, 1984 <doi:10.2307/2347661>), using a quasi-likelihood approach.
2891 Statistics for the Social Sciences dr Methods for Dimension Reduction for Regression Functions, methods, and datasets for fitting dimension reduction regression, using slicing (methods SAVE and SIR), Principal Hessian Directions (phd, using residuals and the response), and an iterative IRE. Partial methods, that condition on categorical predictors are also available. A variety of tests, and stepwise deletion of predictors, is also included. Also included is code for computing permutation tests of dimension. Adding additional methods of estimating dimension is straightforward. For documentation, see the vignette in the package. With version 3.0.4, the arguments for dr.step have been modified.
2892 Statistics for the Social Sciences effects (core) Effect Displays for Linear, Generalized Linear, and Other Models Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.
2893 Statistics for the Social Sciences ergm Fit, Simulate and Diagnose Exponential-Family Models for Networks An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). ‘ergm’ is a part of the Statnet suite of packages for network analysis.
2894 Statistics for the Social Sciences exactLoglinTest Monte Carlo Exact Tests for Log-linear models Monte Carlo and MCMC goodness of fit tests for log-linear models
2895 Statistics for the Social Sciences gam (core) Generalized Additive Models Functions for fitting and working with generalized additive models, as described in chapter 7 of “Statistical Models in S” (Chambers and Hastie (eds), 1991), and “Generalized Additive Models” (Hastie and Tibshirani, 1990).
2896 Statistics for the Social Sciences gee Generalized Estimation Equation Solver Generalized Estimation Equation solver.
2897 Statistics for the Social Sciences geepack Generalized Estimating Equation Package Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses.
2898 Statistics for the Social Sciences gmodels Various R Programming Tools for Model Fitting Various R programming tools for model fitting.
2899 Statistics for the Social Sciences gnm Generalized Nonlinear Models Functions to specify and fit generalized nonlinear models, including models with multiplicative interaction terms such as the UNIDIFF model from sociology and the AMMI model from crop science, and many others. Over-parameterized representations of models are used throughout; functions are provided for inference on estimable parameter combinations, as well as standard methods for diagnostics etc.
2900 Statistics for the Social Sciences gss General Smoothing Splines A comprehensive package for structural multivariate function estimation using smoothing splines.
2901 Statistics for the Social Sciences Hmisc (core) Harrell Miscellaneous Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
2902 Statistics for the Social Sciences influence.ME Tools for Detecting Influential Data in Mixed Effects Models Provides a collection of tools for detecting influential cases in generalized mixed effects models. It analyses models that were estimated using ‘lme4’. The basic rationale behind identifying influential data is that when single units are omitted from the data, models based on these data should not produce substantially different estimates. To standardize the assessment of how influential a (single group of) observation(s) is, several measures of influence are common practice, such as Cook’s Distance. In addition, we provide a measure of percentage change of the fixed point estimates and a simple procedure to detect changing levels of significance.
2903 Statistics for the Social Sciences latentnet Latent Position and Cluster Models for Statistical Networks Fit and simulate latent position and cluster models for statistical networks.
2904 Statistics for the Social Sciences leaps Regression Subset Selection Regression subset selection, including exhaustive search.
2905 Statistics for the Social Sciences lme4 (core) Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
2906 Statistics for the Social Sciences lmeSplines Add smoothing spline modelling capability to nlme Add smoothing spline modelling capability to nlme. Fit smoothing spline terms in Gaussian linear and nonlinear mixed-effects models
2907 Statistics for the Social Sciences lmm Linear Mixed Models It implements Expectation/Conditional Maximization Either (ECME) and rapidly converging algorithms as well as Bayesian inference for linear mixed models, which is described in Schafer, J.L. (1998) “Some improved procedures for linear mixed models”. Dept. of Statistics, The Pennsylvania State University.
2908 Statistics for the Social Sciences lmtest (core) Testing Linear Regression Models A collection of tests, data sets, and examples for diagnostic checking in linear regression models. Furthermore, some generic tools for inference in parametric models are provided.
2909 Statistics for the Social Sciences locfit Local Regression, Likelihood and Density Estimation Local regression, likelihood and density estimation.
2910 Statistics for the Social Sciences logistf Firth’s Bias-Reduced Logistic Regression Fit a logistic regression model using Firth’s bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth’s method was proposed as ideal solution to the problem of separation in logistic regression. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained.
2911 Statistics for the Social Sciences logmult Log-Multiplicative Models, Including Association Models Functions to fit log-multiplicative models using ‘gnm’, with support for convenient printing, plots, and jackknife/bootstrap standard errors. For complex survey data, models can be fitted from design objects from the ‘survey’ package. Currently supported models include UNIDIFF (Erikson & Goldthorpe), a.k.a. log-multiplicative layer effect model (Xie), and several association models: Goodman’s row-column association models of the RC(M) and RC(M)-L families with one or several dimensions; two skew-symmetric association models proposed by Yamaguchi and by van der Heijden & Mooijaart. Functions allow computing the intrinsic association coefficient (and therefore the Altham index), including via the Bayes shrinkage estimator proposed by Zhou; and the RAS/IPF/Deming-Stephan algorithm.
2912 Statistics for the Social Sciences lsmeans (core) Least-Squares Means Obtain least-squares means for linear, generalized linear, and mixed models. Compute contrasts or linear functions of least-squares means, and comparisons of slopes. Plots and compact letter displays. Least-squares means were proposed in Harvey, W (1960) “Least-squares analysis of data with unequal subclass numbers”, Tech Report ARS-20-8, USDA National Agricultural Library, and discussed further in Searle, Speed, and Milliken (1980) “Population marginal means in the linear model: An alternative to least squares means”, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>. NOTE: lsmeans now relies primarily on code in the ‘emmeans’ package. ‘lsmeans’ will be archived in the near future.
2913 Statistics for the Social Sciences MASS (core) Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
2914 Statistics for the Social Sciences Matching Multivariate and Propensity Score Matching with Balance Optimization Provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance has been obtained are also provided.
2915 Statistics for the Social Sciences MatchIt Nonparametric Preprocessing for Parametric Causal Inference Selects matched samples of the original treated and control groups with similar covariate distributions can be used to match exactly on covariates, to match on propensity scores, or perform a variety of other matching procedures. The package also implements a series of recommendations offered in Ho, Imai, King, and Stuart (2007) <doi:10.1093/pan/mpl013>.
2916 Statistics for the Social Sciences MCMCglmm (core) MCMC Generalised Linear Mixed Models MCMC Generalised Linear Mixed Models.
2917 Statistics for the Social Sciences mgcv (core) Mixed GAM Computation Vehicle with Automatic Smoothness Estimation Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, ‘JAGS’ support and distributions beyond the exponential family.
2918 Statistics for the Social Sciences mi (core) Missing Data Imputation and Model Checking The mi package provides functions for data manipulation, imputing missing values in an approximate Bayesian framework, diagnostics of the models used to generate the imputations, confidence-building mechanisms to validate some of the assumptions of the imputation algorithm, and functions to analyze multiply imputed data sets with the appropriate degree of sampling uncertainty.
2919 Statistics for the Social Sciences mice (core) Multivariate Imputation by Chained Equations Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
2920 Statistics for the Social Sciences mitools Tools for Multiple Imputation of Missing Data Tools to perform analyses and combine results from multiple-imputation datasets.
2921 Statistics for the Social Sciences mix Estimation/Multiple Imputation for Mixed Categorical and Continuous Data Estimation/multiple imputation programs for mixed categorical and continuous data.
2922 Statistics for the Social Sciences mlogit Multinomial Logit Models Maximum Likelihood estimation of random utility discrete choice models (logit and probit).
2923 Statistics for the Social Sciences MNP R Package for Fitting the Multinomial Probit Model Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). “A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,” Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). “MNP: R Package for Fitting the Multinomial Probit Model.” Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. <doi:10.18637/jss.v014.i03>.
2924 Statistics for the Social Sciences multcomp (core) Simultaneous Inference in General Parametric Models Simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects, and survival models. The package includes demos reproducing analyzes presented in the book “Multiple Comparisons Using R” (Bretz, Hothorn, Westfall, 2010, CRC Press).
2925 Statistics for the Social Sciences multgee GEE Solver for Correlated Nominal or Ordinal Multinomial Responses GEE solver for correlated nominal or ordinal multinomial responses using a local odds ratios parameterization.
2926 Statistics for the Social Sciences multinomRob Robust Estimation of Overdispersed Multinomial Regression Models MNL and overdispersed multinomial regression using robust (LQD and tanh) estimation
2927 Statistics for the Social Sciences multiplex Algebraic Tools for the Analysis of Multiple Social Networks Algebraic procedures for the analysis of multiple social networks are delivered with this package. Among other things, it makes possible to create and manipulate multivariate network data with different formats, and there are effective ways available to treat multiple networks with routines that combine algebraic systems like the partially ordered semigroup or the semiring structure together with the relational bundles occurring in different types of multivariate network data sets. It also provides an algebraic approach for two-mode networks through Galois derivations between families of the pairs of subsets in the two domains.
2928 Statistics for the Social Sciences mvnmle ML Estimation for Multivariate Normal Data with Missing Values Finds the maximum likelihood estimate of the mean vector and variance-covariance matrix for multivariate normal data with missing values.
2929 Statistics for the Social Sciences network Classes for Relational Data Tools to create and modify network objects. The network class can represent a range of relational data types, and supports arbitrary vertex/edge/graph attributes.
2930 Statistics for the Social Sciences nlme (core) Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
2931 Statistics for the Social Sciences nlstools Tools for Nonlinear Regression Analysis Several tools for assessing the quality of fit of a gaussian nonlinear model are provided.
2932 Statistics for the Social Sciences nnet (core) Feed-Forward Neural Networks and Multinomial Log-Linear Models Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.
2933 Statistics for the Social Sciences norm Analysis of multivariate normal datasets with missing values Analysis of multivariate normal datasets with missing values
2934 Statistics for the Social Sciences np Nonparametric Kernel Smoothing Methods for Mixed Data Types Nonparametric (and semiparametric) kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types. We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC, <http://www.nserc-crsng.gc.ca>), the Social Sciences and Humanities Research Council of Canada (SSHRC, <http://www.sshrc-crsh.gc.ca>), and the Shared Hierarchical Academic Research Computing Network (SHARCNET, <http://www.sharcnet.ca>).
2935 Statistics for the Social Sciences optmatch Functions for Optimal Matching Distance based bipartite matching using the RELAX-IV minimum cost flow solver, oriented to matching of treatment and control groups in observational studies. Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination.
2936 Statistics for the Social Sciences PAFit Generative Mechanism Estimation in Temporal Complex Networks Statistical methods for estimating preferential attachment and node fitness generative mechanisms in temporal complex networks are provided. Thong Pham et al. (2015) <doi:10.1371/journal.pone.0137796>. Thong Pham et al. (2016) <doi:10.1038/srep32558>.
2937 Statistics for the Social Sciences pan Multiple Imputation for Multivariate Panel or Clustered Data It provides functions and examples for maximum likelihood estimation for generalized linear mixed models and Gibbs sampler for multivariate linear mixed models with incomplete data, as described in Schafer JL (1997) “Imputation of missing covariates under a multivariate linear mixed model”. Technical report 97-04, Dept. of Statistics, The Pennsylvania State University.
2938 Statistics for the Social Sciences perturb Tools for Evaluating Collinearity Use the perturb() function to evaluates collinearity by adding random noise to selected variables (Hendrickx & Pelzer 2004). The colldiag function() calculates condition numbers and variance decomposition proportions to test for collinearity and uncover its sources (Belsley 1980).
2939 Statistics for the Social Sciences PSAgraphics Propensity Score Analysis Graphics A collection of functions that primarily produce graphics to aid in a Propensity Score Analysis (PSA). Functions include: cat.psa and box.psa to test balance within strata of categorical and quantitative covariates, circ.psa for a representation of the estimated effect size by stratum, loess.psa that provides a graphic and loess based effect size estimate, and various balance functions that provide measures of the balance achieved via a PSA in a categorical covariate.
2940 Statistics for the Social Sciences pscl Political Science Computational Laboratory Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching at the Political Science Computational Laboratory; seats-votes curves.
2941 Statistics for the Social Sciences quantreg (core) Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
2942 Statistics for the Social Sciences qvcalc Quasi Variances for Factor Effects in Statistical Models Functions to compute quasi variances and associated measures of approximation error.
2943 Statistics for the Social Sciences rms (core) Regression Modeling Strategies Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. ‘rms’ is a collection of functions that assist with and streamline modeling. It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the Buckley-James multiple regression model for right-censored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. ‘rms’ works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.
2944 Statistics for the Social Sciences RSiena Siena - Simulation Investigation for Empirical Network Analysis The main purpose of this package is to perform simulation-based estimation of stochastic actor-oriented models for longitudinal network data collected as panel data. Dependent variables can be single or multivariate networks, which can be directed, non-directed, or two-mode. There are also functions for testing parameters and checking goodness of fit. An overview of these models is given in Tom A.B. Snijders (2017), Stochastic Actor-Oriented Models for Network Dynamics, Annual Review of Statistics and Its Application, 4, 343-363 <doi:10.1146/annurev-statistics-060116-054035>.
2945 Statistics for the Social Sciences sandwich (core) Robust Covariance Matrix Estimators Model-robust standard error estimators for cross-sectional, time series, clustered, panel, and longitudinal data.
2946 Statistics for the Social Sciences simpleboot Simple Bootstrap Routines Simple bootstrap routines.
2947 Statistics for the Social Sciences sm Smoothing Methods for Nonparametric Regression and Density Estimation This is software linked to the book ‘Applied Smoothing Techniques for Data Analysis - The Kernel Approach with S-Plus Illustrations’ Oxford University Press.
2948 Statistics for the Social Sciences sna Tools for Social Network Analysis A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.
2949 Statistics for the Social Sciences spatial Functions for Kriging and Point Pattern Analysis Functions for kriging and point pattern analysis.
2950 Statistics for the Social Sciences statnet Software Tools for the Statistical Analysis of Network Data Statnet is a collection of packages for statistical network analysis that are designed to work together because they share common data representations and ‘API’ design. They provide an integrated set of tools for the representation, visualization, analysis, and simulation of many different forms of network data. This package is designed to make it easy to install and load the key ‘statnet’ packages in a single step. Learn more about ‘statnet’ at <http://www.statnet.org>. For an introduction to functions in this package, type help(package=‘statnet’).
2951 Statistics for the Social Sciences survey (core) Analysis of Complex Survey Samples Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase subsampling designs. Graphics. PPS sampling without replacement. Principal components, factor analysis.
2952 Statistics for the Social Sciences survival (core) Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
2953 Statistics for the Social Sciences vcd Visualizing Categorical Data Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).
2954 Statistics for the Social Sciences VGAM (core) Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
2955 Statistics for the Social Sciences VIM Visualization and Imputation of Missing Values New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
2956 Statistics for the Social Sciences visreg Visualization of Regression Models Provides a convenient interface for constructing plots to visualize the fit of regression models arising from a wide variety of models in R (‘lm’, ‘glm’, ‘coxph’, ‘rlm’, ‘gam’, ‘locfit’, ‘lmer’, ‘randomForest’, etc.)
2957 Statistics for the Social Sciences Zelig Everyone’s Statistical Software A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flowprocedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
2958 Analysis of Spatial Data ade4 Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
2959 Analysis of Spatial Data adehabitatHR Home Range Estimation A collection of tools for the estimation of animals home range.
2960 Analysis of Spatial Data adehabitatHS Analysis of Habitat Selection by Animals A collection of tools for the analysis of habitat selection.
2961 Analysis of Spatial Data adehabitatLT Analysis of Animal Movements A collection of tools for the analysis of animal movements.
2962 Analysis of Spatial Data adehabitatMA Tools to Deal with Raster Maps A collection of tools to deal with raster maps.
2963 Analysis of Spatial Data ads Spatial Point Patterns Analysis Perform first- and second-order multi-scale analyses derived from Ripley K-function, for univariate, multivariate and marked mapped data in rectangular, circular or irregular shaped sampling windows, with tests of statistical significance based on Monte Carlo simulations.
2964 Analysis of Spatial Data akima Interpolation of Irregularly and Regularly Spaced Data Several cubic spline interpolation methods of H. Akima for irregular and regular gridded data are available through this package, both for the bivariate case (irregular data: ACM 761, regular data: ACM 760) and univariate case (ACM 433 and ACM 697). Linear interpolation of irregular gridded data is also covered by reusing D. J. Renkas triangulation code which is part of Akimas Fortran code. A bilinear interpolator for regular grids was also added for comparison with the bicubic interpolator on regular grids.
2965 Analysis of Spatial Data AMOEBA A Multidirectional Optimum Ecotope-Based Algorithm A function to calculate spatial clusters using the Getis-Ord local statistic. It searches irregular clusters (ecotopes) on a map.
2966 Analysis of Spatial Data areal Areal Weighted Interpolation A pipeable, transparent implementation of areal weighted interpolation with support for interpolating multiple variables in a single function call. These tools provide a full-featured workflow for validation and estimation that fits into both modern data management (e.g. tidyverse) and spatial data (e.g. sf) frameworks.
2967 Analysis of Spatial Data ash David Scott’s ASH Routines David Scott’s ASH routines ported from S-PLUS to R.
2968 Analysis of Spatial Data aspace A collection of functions for estimating centrographic statistics and computational geometries for spatial point patterns A collection of functions for computing centrographic statistics (e.g., standard distance, standard deviation ellipse, standard deviation box) for observations taken at point locations. Separate plotting functions have been developed for each measure. Users interested in writing results to ESRI shapefiles can do so by using results from aspace functions as inputs to the convert.to.shapefile and write.shapefile functions in the shapefiles library. The aspace library was originally conceived to aid in the analysis of spatial patterns of travel behaviour (see Buliung and Remmel, 2008). Major changes in the current version include (1) removal of dependencies on several external libraries (e.g., gpclib, maptools, sp), (2) the separation of plotting and estimation capabilities, (3) reduction in the number of functions, and (4) expansion of analytical capabilities with additional functions for descriptive analysis and visualization (e.g., standard deviation box, centre of minimum distance, central feature).
2969 Analysis of Spatial Data automap Automatic interpolation package This package performs an automatic interpolation by automatically estimating the variogram and then calling gstat.
2970 Analysis of Spatial Data CARBayes Spatial Generalised Linear Mixed Models for Areal Unit Data Implements a class of univariate and multivariate spatial generalised linear mixed models for areal unit data, with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation. The response variable can be binomial, Gaussian, multinomial, Poisson or zero-inflated Poisson (ZIP), and spatial autocorrelation is modelled by a set of random effects that are assigned a conditional autoregressive (CAR) prior distribution. A number of different models are available for univariate spatial data, including models with no random effects as well as random effects modelled by different types of CAR prior, including the BYM model (Besag et al. (1991) <doi:10.1007/BF00116466>), the Leroux model (Leroux et al. (2000) <doi:10.1007/978-1-4612-1284-3_4>) and the localised model (Lee et al. (2015) <doi:10.1002/env.2348>). Additionally, a multivariate CAR (MCAR) model for multivariate spatial data is available, as is a two-level hierarchical model for modelling data relating to individuals within areas. Full details are given in the vignette accompanying this package. The initial creation of this package was supported by the Economic and Social Research Council (ESRC) grant RES-000-22-4256, and on-going development has been supported by the Engineering and Physical Science Research Council (EPSRC) grant EP/J017442/1, ESRC grant ES/K006460/1, Innovate UK / Natural Environment Research Council (NERC) grant NE/N007352/1 and the TB Alliance.
2971 Analysis of Spatial Data cartogram Create Cartograms with R Construct continuous and non-contiguous area cartograms.
2972 Analysis of Spatial Data cartography Thematic Cartography Create and integrate maps in your R workflow. This package helps to design cartographic representations such as proportional symbols, choropleth, typology, flows or discontinuities maps. It also offers several features that improve the graphic presentation of maps, for instance, map palettes, layout elements (scale, north arrow, title…), labels or legends. See Giraud and Lambert (2017) <doi:10.1007/978-3-319-57336-6_13>.
2973 Analysis of Spatial Data classInt (core) Choose Univariate Class Intervals Selected commonly used methods for choosing univariate class intervals for mapping or other graphics purposes.
2974 Analysis of Spatial Data cleangeo Cleaning Geometries from Spatial Objects Provides a set of utility tools to inspect spatial objects, facilitate handling and reporting of topology errors and geometry validity issues. Finally, it provides a geometry cleaner that will fix all geometry problems, and eliminate (at least reduce) the likelihood of having issues when doing spatial data processing.
2975 Analysis of Spatial Data CompRandFld Composite-Likelihood Based Analysis of Random Fields A set of procedures for the analysis of Random Fields using likelihood and non-standard likelihood methods is provided. Spatial analysis often involves dealing with large dataset. Therefore even simple studies may be too computationally demanding. Composite likelihood inference is emerging as a useful tool for mitigating such computational problems. This methodology shows satisfactory results when compared with other techniques such as the tapering method. Moreover, composite likelihood (and related quantities) have some useful properties similar to those of the standard likelihood.
2976 Analysis of Spatial Data constrainedKriging Constrained, Covariance-Matching Constrained and Universal Point or Block Kriging Provides functions for efficient computations of nonlinear spatial predictions with local change of support. This package supplies functions for tow-dimensional spatial interpolation by constrained, covariance-matching constrained and universal (external drift) kriging for points or block of any shape for data with a nonstationary mean function and an isotropic weakly stationary variogram. The linear spatial interpolation methods, constrained and covariance-matching constrained kriging, provide approximately unbiased prediction for nonlinear target values under change of support. This package extends the range of geostatistical tools available in R and provides a veritable alternative to conditional simulation for nonlinear spatial prediction problems with local change of support.
2977 Analysis of Spatial Data cshapes The CShapes Dataset and Utilities Package for CShapes, a GIS dataset of country boundaries (1946-today). Includes functions for data extraction and the computation of distance matrices and -lists.
2978 Analysis of Spatial Data dbmss Distance-Based Measures of Spatial Structures Simple computation of spatial statistic functions of distance to characterize the spatial structures of mapped objects, following Marcon, Traissac, Puech, and Lang (2015) <doi:10.18637/jss.v067.c03>. Includes classical functions (Ripley’s K and others) and more recent ones used by spatial economists (Duranton and Overman’s Kd, Marcon and Puech’s M). Relies on ‘spatstat’ for some core calculation.
2979 Analysis of Spatial Data DCluster (core) Functions for the Detection of Spatial Clusters of Diseases A set of functions for the detection of spatial clusters of disease using count data. Bootstrap is used to estimate sampling distributions of statistics.
2980 Analysis of Spatial Data deldir (core) Delaunay Triangulation and Dirichlet (Voronoi) Tessellation Calculates the Delaunay triangulation and the Dirichlet or Voronoi tessellation (with respect to the entire plane) of a planar point set. Plots triangulations and tessellations in various ways. Clips tessellations to sub-windows. Calculates perimeters of tessellations. Summarises information about the tiles of the tessellation.
2981 Analysis of Spatial Data diseasemapping Modelling Spatial Variation in Disease Risk for Areal Data Formatting of population and case data, calculation of Standardized Incidence Ratios, and fitting the BYM model using INLA.
2982 Analysis of Spatial Data DSpat Spatial Modelling for Distance Sampling Data Fits inhomogeneous Poisson process spatial models to line transect sampling data and provides estimate of abundance within a region.
2983 Analysis of Spatial Data ecespa Functions for Spatial Point Pattern Analysis Some wrappers, functions and data sets for for spatial point pattern analysis (mainly based on ‘spatstat’), used in the book “Introduccion al Analisis Espacial de Datos en Ecologia y Ciencias Ambientales: Metodos y Aplicaciones” and in the papers by De la Cruz et al. (2008) <doi:10.1111/j.0906-7590.2008.05299.x> and Olano et al. (2009) <doi:10.1051/forest:2008074>.
2984 Analysis of Spatial Data ExceedanceTools Confidence regions for exceedance sets and contour lines Tools for constructing confidence regions for exceedance regions and contour lines.
2985 Analysis of Spatial Data fields Tools for Spatial Data For curve, surface and function fitting with an emphasis on splines, spatial data and spatial statistics. The major methods include cubic, and thin plate splines, Kriging, and compactly supported covariance functions for large data sets. The splines and Kriging methods are supported by functions that can determine the smoothing parameter (nugget and sill variance) and other covariance function parameters by cross validation and also by restricted maximum likelihood. For Kriging there is an easy to use function that also estimates the correlation scale (range parameter). A major feature is that any covariance function implemented in R and following a simple format can be used for spatial prediction. There are also many useful functions for plotting and working with spatial data as images. This package also contains an implementation of sparse matrix methods for large spatial data sets and currently requires the sparse matrix (spam) package. Use help(fields) to get started and for an overview. The fields source code is deliberately commented and provides useful explanations of numerical details as a companion to the manual pages. The commented source code can be viewed by expanding source code version and looking in the R subdirectory. The reference for fields can be generated by the citation function in R and has DOI <doi:10.5065/D6W957CT>. Development of this package was supported in part by the National Science Foundation Grant 1417857 and the National Center for Atmospheric Research. See the Fields URL for a vignette on using this package and some background on spatial statistics.
2986 Analysis of Spatial Data FieldSim Random Fields (and Bridges) Simulations Tools for random fields and bridges simulations.
2987 Analysis of Spatial Data FRK Fixed Rank Kriging Fixed Rank Kriging is a tool for spatial/spatio-temporal modelling and prediction with large datasets. The approach, discussed in Cressie and Johannesson (2008) <doi:10.1111/j.1467-9868.2007.00633.x>, decomposes the field, and hence the covariance function, using a fixed set of n basis functions, where n is typically much smaller than the number of data points (or polygons) m. The method naturally allows for non-stationary, anisotropic covariance functions and the use of observations with varying support (with known error variance). The projected field is a key building block of the Spatial Random Effects (SRE) model, on which this package is based. The package FRK provides helper functions to model, fit, and predict using an SRE with relative ease.
2988 Analysis of Spatial Data gdalUtils Wrappers for the Geospatial Data Abstraction Library (GDAL) Utilities Wrappers for the Geospatial Data Abstraction Library (GDAL) Utilities.
2989 Analysis of Spatial Data gdistance Distances and Routes on Geographical Grids Calculate distances and routes on geographic grids.
2990 Analysis of Spatial Data gear Geostatistical Analysis in R Implements common geostatistical methods in a clean, straightforward, efficient manner. A quasi reboot of the SpatialTools R package.
2991 Analysis of Spatial Data geoaxe Split ‘Geospatial’ Objects into Pieces Split ‘geospatial’ objects into pieces. Includes support for some spatial object inputs, ‘Well-Known Text’, and ‘GeoJSON’.
2992 Analysis of Spatial Data geogrid Turn Geospatial Polygons into Regular or Hexagonal Grids Turn irregular polygons (such as geographical regions) into regular or hexagonal grids. This package enables the generation of regular (square) and hexagonal grids through the package ‘sp’ and then assigns the content of the existing polygons to the new grid using the Hungarian algorithm, Kuhn (1955) (<doi:10.1007/978-3-540-68279-0_2>). This prevents the need for manual generation of hexagonal grids or regular grids that are supposed to reflect existing geography.
2993 Analysis of Spatial Data geojson Classes for ‘GeoJSON’ Classes for ‘GeoJSON’ to make working with ‘GeoJSON’ easier. Includes S3 classes for ‘GeoJSON’ classes with brief summary output, and a few methods such as extracting and adding bounding boxes, properties, and coordinate reference systems; working with newline delimited ‘GeoJSON’; linting through the ‘geojsonlint’ package; and serializing to/from ‘Geobuf’ binary ‘GeoJSON’ format.
2994 Analysis of Spatial Data geojsonio Convert Data from and to ‘GeoJSON’ or ‘TopoJSON’ Convert data to ‘GeoJSON’ or ‘TopoJSON’ from various R classes, including vectors, lists, data frames, shape files, and spatial classes. ‘geojsonio’ does not aim to replace packages like ‘sp’, ‘rgdal’, ‘rgeos’, but rather aims to be a high level client to simplify conversions of data from and to ‘GeoJSON’ and ‘TopoJSON’.
2995 Analysis of Spatial Data GEOmap Topographic and Geologic Mapping Set of routines for making Map Projections (forward and inverse), Topographic Maps, Perspective plots, Geological Maps, geological map symbols, geological databases, interactive plotting and selection of focus regions.
2996 Analysis of Spatial Data geomapdata Data for topographic and Geologic Mapping Set of data for use in package GEOmap. Includes world map, USA map, Coso map, Japan Map, ETOPO5
2997 Analysis of Spatial Data geometa Tools for Reading and Writing ISO/OGC Geographic Metadata Provides facilities to handle reading and writing of geographic metadata defined with OGC/ISO 19115, 11119 and 19110 geographic information metadata standards, and encoded using the ISO 19139 (XML) standard. It includes also a facility to check the validity of ISO 19139 XML encoded metadata.
2998 Analysis of Spatial Data geonames Interface to the “Geonames” Spatial Query Web Service The web service at <https://www.geonames.org/> provides a number of spatial data queries, including administrative area hierarchies, city locations and some country postal code queries. A (free) username is required and rate limits exist.
2999 Analysis of Spatial Data geonapi ‘GeoNetwork’ API R Interface Provides an R interface to the ‘GeoNetwork’ API (<https://geonetwork-opensource.org/#api>) allowing to upload and publish metadata in a ‘GeoNetwork’ web-application and exposte it to OGC CSW.
3000 Analysis of Spatial Data geoR (core) Analysis of Geostatistical Data Geostatistical analysis including traditional, likelihood-based and Bayesian methods.
3001 Analysis of Spatial Data geoRglm A Package for Generalised Linear Spatial Models Functions for inference in generalised linear spatial models. The posterior and predictive inference is based on Markov chain Monte Carlo methods. Package geoRglm is an extension to the package geoR, which must be installed first.
3002 Analysis of Spatial Data georob Robust Geostatistical Analysis of Spatial Data Provides functions for efficiently fitting linear models with spatially correlated errors by robust and Gaussian (Restricted) Maximum Likelihood and for computing robust and customary point and block external-drift Kriging predictions, along with utility functions for variogram modelling in ad hoc geostatistical analyses, model building, model evaluation by cross-validation, (conditional) simulation of Gaussian processes, unbiased back-transformation of Kriging predictions of log-transformed data.
3003 Analysis of Spatial Data geosapi GeoServer REST API R Interface Provides an R interface to the GeoServer REST API, allowing to upload and publish data in a GeoServer web-application and expose data to OGC Web-Services. The package currently supports all CRUD (Create,Read,Update,Delete) operations on GeoServer workspaces, namespaces, datastores (stores of vector data), featuretypes, layers, styles, as well as vector data upload operations. For more information about the GeoServer REST API, see <http://docs.geoserver.org/stable/en/user/rest/>.
3004 Analysis of Spatial Data geospacom Facilitate Generating of Distance Matrices Used in Package ‘spacom’ and Plotting Data on Maps Generates distance matrices from shape files and represents spatially weighted multilevel analysis results (see ‘spacom’)
3005 Analysis of Spatial Data geosphere Spherical Trigonometry Spherical trigonometry for geographic applications. That is, compute distances and related measures for angular (longitude/latitude) locations.
3006 Analysis of Spatial Data geospt Geostatistical Analysis and Design of Optimal Spatial Sampling Networks Estimation of the variogram through trimmed mean, radial basis functions (optimization, prediction and cross-validation), summary statistics from cross-validation, pocket plot, and design of optimal sampling networks through sequential and simultaneous points methods.
3007 Analysis of Spatial Data geostatsp Geostatistical Modelling with Likelihood and Bayes Geostatistical modelling facilities using Raster and SpatialPoints objects are provided. Non-Gaussian models are fit using INLA, and Gaussian geostatistical models use Maximum Likelihood Estimation. For details see Brown (2015) <doi:10.18637/jss.v063.i12>.
3008 Analysis of Spatial Data ggmap Spatial Visualization with ggplot2 A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.
3009 Analysis of Spatial Data ggsn North Symbols and Scale Bars for Maps Created with ‘ggplot2’ or ‘ggmap’ Adds north symbols (18 options) and scale bars in kilometers, meters, nautical miles, or statue miles, to maps in geographic or metric coordinates created with ‘ggplot2’ or ‘ggmap’.
3010 Analysis of Spatial Data gmt Interface Between GMT Map-Making Software and R Interface between the GMT map-making software and R, enabling the user to manipulate geographic data within R and call GMT commands to draw and annotate maps in postscript format. The gmt package is about interactive data analysis, rapidly visualizing subsets and summaries of geographic data, while performing statistical analysis in the R console.
3011 Analysis of Spatial Data GriegSmith Uses Grieg-Smith method on 2 dimentional spatial data The function GriegSmith accepts either quadrat count data, a point process object(ppp) or a matrix of x and y coordinates. The function calculates a nested analysis of variance and simulation envelopes.
3012 Analysis of Spatial Data gstat (core) Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation Variogram modelling; simple, ordinary and universal point or block (co)kriging; spatio-temporal kriging; sequential Gaussian or indicator (co)simulation; variogram and variogram map plotting utility functions; supports sf and stars.
3013 Analysis of Spatial Data Guerry Maps, data and methods related to Guerry (1833) “Moral Statistics of France” This package comprises maps of France in 1830, multivariate data from A.-M. Guerry and others, and statistical and graphic methods related to Guerry’s “Moral Statistics of France”. The goal is to facilitate the exploration and development of statistical and graphic methods for multivariate data in a geo-spatial context of historical interest.
3014 Analysis of Spatial Data GWmodel Geographically-Weighted Models In GWmodel, we introduce techniques from a particular branch of spatial statistics,termed geographically-weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localised calibration provides a better description. GWmodel includes functions to calibrate: GW summary statistics, GW principal components analysis, GW discriminant analysis and various forms of GW regression; some of which are provided in basic and robust (outlier resistant) forms.
3015 Analysis of Spatial Data gwrr Fits geographically weighted regression models with diagnostic tools Fits geographically weighted regression (GWR) models and has tools to diagnose and remediate collinearity in the GWR models. Also fits geographically weighted ridge regression (GWRR) and geographically weighted lasso (GWL) models.
3016 Analysis of Spatial Data HSAR Hierarchical Spatial Autoregressive Model A Hierarchical Spatial Autoregressive Model (HSAR), based on a Bayesian Markov Chain Monte Carlo (MCMC) algorithm ( Dong and Harris (2014) <doi:10.1111/gean.12049> ). The creation of this package was supported by the Economic and Social Research Council (ESRC) through the Applied Quantitative Methods Network: Phase II, grant number ES/K006460/1.
3017 Analysis of Spatial Data igraph Network Analysis and Visualization Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
3018 Analysis of Spatial Data inlmisc Miscellaneous Functions for the USGS INL Project Office A collection of functions for creating high-level graphics, performing raster-based analysis, processing MODFLOW-based models, selecting subsets using a genetic algorithm, creating interactive web maps, accessing color palettes, etc. Used to support packages and scripts written by researchers at the United States Geological Survey (USGS) Idaho National Laboratory (INL) Project Office.
3019 Analysis of Spatial Data intamap Procedures for Automated Interpolation Provides classes and methods for automated spatial interpolation.
3020 Analysis of Spatial Data ipdw Spatial Interpolation by Inverse Path Distance Weighting Functions are provided to interpolate geo-referenced point data via Inverse Path Distance Weighting. Useful for coastal marine applications where barriers in the landscape preclude interpolation with Euclidean distances.
3021 Analysis of Spatial Data landsat Radiometric and topographic correction of satellite imagery Processing of Landsat or other multispectral satellite imagery. Includes relative normalization, image-based radiometric correction, and topographic correction options.
3022 Analysis of Spatial Data landscapemetrics Landscape Metrics for Categorical Map Patterns Calculates landscape metrics for categorical landscape patterns in a tidy workflow. ‘landscapemetrics’ reimplements the most common metrics from ‘FRAGSTATS’ (<https://www.umass.edu/landeco/research/fragstats/fragstats.html>) and new ones from the current literature on landscape metrics. This package supports ‘raster’ spatial objects and takes RasterLayer, RasterStacks, RasterBricks or lists of RasterLayer from the ‘raster’ package as input arguments. It further provides utility functions to visualize patches, select metrics and building blocks to develop new metrics.
3023 Analysis of Spatial Data latticeDensity Density Estimation and Nonparametric Regression on Irregular Regions Functions that compute the lattice-based density estimator of Barry and McIntyre, which accounts for point processes in two-dimensional regions with irregular boundaries and holes. The package also implements two-dimensional non-parametric regression for similar regions.
3024 Analysis of Spatial Data lawn Client for ‘Turfjs’ for ‘Geospatial’ Analysis Client for ‘Turfjs’ (<http://turfjs.org>) for ‘geospatial’ analysis. The package revolves around using ‘GeoJSON’ data. Functions are included for creating ‘GeoJSON’ data objects, measuring aspects of ‘GeoJSON’, and combining, transforming, and creating random ‘GeoJSON’ data objects.
3025 Analysis of Spatial Data lctools Local Correlation, Spatial Inequalities, Geographically Weighted Regression and Other Tools Provides researchers and educators with easy-to-learn user friendly tools for calculating key spatial statistics and to apply simple as well as advanced methods of spatial analysis in real data. These include: Local Pearson and Geographically Weighted Pearson Correlation Coefficients, Spatial Inequality Measures (Gini, Spatial Gini, LQ, Focal LQ), Spatial Autocorrelation (Global and Local Moran’s I), several Geographically Weighted Regression techniques and other Spatial Analysis tools (other geographically weighted statistics). This package also contains functions for measuring the significance of each statistic calculated, mainly based on Monte Carlo simulations.
3026 Analysis of Spatial Data leaflet Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library Create and customize interactive maps using the ‘Leaflet’ JavaScript library and the ‘htmlwidgets’ package. These maps can be used directly from the R console, from ‘RStudio’, in Shiny applications and R Markdown documents.
3027 Analysis of Spatial Data leafletR Interactive Web-Maps Based on the Leaflet JavaScript Library Display your spatial data on interactive web-maps using the open-source JavaScript library Leaflet. ‘leafletR’ provides basic web-mapping functionality to combine vector data and online map tiles from different sources. See <http://leafletjs.com> for more information on Leaflet.
3028 Analysis of Spatial Data lwgeom Bindings to Selected ‘liblwgeom’ Functions for Simple Features Access to selected functions found in ‘liblwgeom’ <https://github.com/postgis/postgis/tree/svn-trunk/liblwgeom>, the light-weight geometry library used by ‘PostGIS’ <http://postgis.net/>.
3029 Analysis of Spatial Data magclass Data Class and Tools for Handling Spatial-Temporal Data Data class for increased interoperability working with spatial- temporal data together with corresponding functions and methods (conversions, basic calculations and basic data manipulation). The class distinguishes between spatial, temporal and other dimensions to facilitate the development and interoperability of tools build for it. Additional features are name-based addressing of data and internal consistency checks (e.g. checking for the right data order in calculations).
3030 Analysis of Spatial Data mapdata Extra Map Databases Supplement to maps package, providing the larger and/or higher-resolution databases.
3031 Analysis of Spatial Data mapedit Interactive Editing of Spatial Data in R Suite of interactive functions and helpers for selecting and editing geospatial data.
3032 Analysis of Spatial Data mapmisc Utilities for Producing Maps A minimal, light-weight set of tools for producing nice looking maps in R, with support for map projections.
3033 Analysis of Spatial Data mapproj Map Projections Converts latitude/longitude into projected coordinates.
3034 Analysis of Spatial Data maps Draw Geographical Maps Display of maps. Projection code and larger maps are in separate packages (‘mapproj’ and ‘mapdata’).
3035 Analysis of Spatial Data maptools (core) Tools for Handling Spatial Objects Set of tools for manipulating geographic data. It includes binary access to ‘GSHHG’ shoreline files. The package also provides interface wrappers for exchanging spatial objects with packages such as ‘PBSmapping’, ‘spatstat’, ‘maps’, ‘RArcInfo’, and others.
3036 Analysis of Spatial Data mapview Interactive Viewing of Spatial Data in R Quickly and conveniently create interactive visualisations of spatial data with or without background maps. Attributes of displayed features are fully queryable via pop-up windows. Additional functionality includes methods to visualise true- and false-color raster images, bounding boxes, small multiples and 3D raster data cubes.
3037 Analysis of Spatial Data marmap Import, Plot and Analyze Bathymetric and Topographic Data Import xyz data from the NOAA (National Oceanic and Atmospheric Administration, <http://www.noaa.gov>), GEBCO (General Bathymetric Chart of the Oceans, <http://www.gebco.net>) and other sources, plot xyz data to prepare publication-ready figures, analyze xyz data to extract transects, get depth / altitude based on geographical coordinates, or calculate z-constrained least-cost paths.
3038 Analysis of Spatial Data MBA Multilevel B-Spline Approximation Functions to interpolate irregularly and regularly spaced data using Multilevel B-spline Approximation (MBA). Functions call portions of the SINTEF Multilevel B-spline Library written by Oyvind Hjelle which implements methods developed by Lee, Wolberg and Shin (1997; <doi:10.1109/2945.620490>).
3039 Analysis of Spatial Data McSpatial Nonparametric spatial data analysis Locally weighted regression, semiparametric and conditionally parametric regression, fourier and cubic spline functions, GMM and linearized spatial logit and probit, k-density functions and counterfactuals, nonparametric quantile regression and conditional density functions, Machado-Mata decomposition for quantile regressions, spatial AR model, repeat sales models, conditionally parametric logit and probit
3040 Analysis of Spatial Data micromap Linked Micromap Plots This group of functions simplifies the creation of linked micromap plots.
3041 Analysis of Spatial Data ModelMap Modeling and Map Production using Random Forest and Related Stochastic Models Creates sophisticated models of training data and validates the models with an independent test set, cross validation, or Out Of Bag (OOB) predictions on the training data. Create graphs and tables of the model validation results. Applies these models to GIS .img files of predictors to create detailed prediction surfaces. Handles large predictor files for map making, by reading in the .img files in chunks, and output to the .txt file the prediction for each data chunk, before reading the next chunk of data.
3042 Analysis of Spatial Data ncdf4 Interface to Unidata netCDF (Version 4 or Earlier) Format Data Files Provides a high-level R interface to data files written using Unidata’s netCDF library (version 4 or earlier), which are binary data files that are portable across platforms and include metadata information in addition to the data sets. Using this package, netCDF files (either version 4 or “classic” version 3) can be opened and data sets read in easily. It is also easy to create new netCDF dimensions, variables, and files, in either version 3 or 4 format, and manipulate existing netCDF files. This package replaces the former ncdf package, which only worked with netcdf version 3 files. For various reasons the names of the functions have had to be changed from the names in the ncdf package. The old ncdf package is still available at the URL given below, if you need to have backward compatibility. It should be possible to have both the ncdf and ncdf4 packages installed simultaneously without a problem. However, the ncdf package does not provide an interface for netcdf version 4 files.
3043 Analysis of Spatial Data ncf Spatial Covariance Functions R functions for analyzing spatial (cross-)covariance: the nonparametric (cross-)covariance function, the spline correlogram, the nonparametric phase coherence function, local indicators of spatial association (LISA), (Mantel) correlogram, (Partial) Mantel test.
3044 Analysis of Spatial Data ngspatial Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data Provides tools for analyzing spatial data, especially non- Gaussian areal data. The current version supports the sparse restricted spatial regression model of Hughes and Haran (2013) <doi:10.1111/j.1467-9868.2012.01041.x>, the centered autologistic model of Caragea and Kaiser (2009) <doi:10.1198/jabes.2009.07032>, and the Bayesian spatial filtering model of Hughes (2017) <arXiv:1706.04651>.
3045 Analysis of Spatial Data nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
3046 Analysis of Spatial Data OasisR Outright Tool for the Analysis of Spatial Inequalities and Segregation A set of indexes and tests for the analysis of social segregation.
3047 Analysis of Spatial Data OpenStreetMap Access to Open Street Map Raster Images Accesses high resolution raster maps using the OpenStreetMap protocol. Dozens of road, satellite, and topographic map servers are directly supported, including Apple, Mapnik, Bing, and stamen. Additionally raster maps may be constructed using custom tile servers. Maps can be plotted using either base graphics, or ggplot2. This package is not affiliated with the OpenStreetMap.org mapping project.
3048 Analysis of Spatial Data osmar OpenStreetMap and R This package provides infrastructure to access OpenStreetMap data from different sources, to work with the data in common R manner, and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
3049 Analysis of Spatial Data ows4R Interface to OGC Web-Services (OWS) Provides an Interface to Web-Services defined as standards by the Open Geospatial Consortium (OGC), including Web Feature Service (WFS) for vector data, Catalogue Service (CSW) for ISO/OGC metadata and associated standards such as the common web-service specification (OWS) and OGC Filter Encoding. The long-term purpose is to add support for additional OGC service standards such as Web Coverage Service (WCS) and Web Processing Service (WPS).
3050 Analysis of Spatial Data pastecs Package for Analysis of Space-Time Ecological Series Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>;) initiative to bring PASSTEC 2000 functionalities to R.
3051 Analysis of Spatial Data PBSmapping Mapping Fisheries Data and Spatial Analysis Tools This software has evolved from fisheries research conducted at the Pacific Biological Station (PBS) in ‘Nanaimo’, British Columbia, Canada. It extends the R language to include two-dimensional plotting features similar to those commonly available in a Geographic Information System (GIS). Embedded C code speeds algorithms from computational geometry, such as finding polygons that contain specified point events or converting between longitude-latitude and Universal Transverse Mercator (UTM) coordinates. Additionally, we include ‘C++’ code developed by Angus Johnson for the ‘Clipper’ library, data for a global shoreline, and other data sets in the public domain. Under the user’s R library directory ‘.libPaths()’, specifically in ‘./PBSmapping/doc’, a complete user’s guide is offered and should be consulted to use package functions effectively.
3052 Analysis of Spatial Data PBSmodelling GUI Tools Made Easy: Interact with Models and Explore Data Provides software to facilitate the design, testing, and operation of computer models. It focuses particularly on tools that make it easy to construct and edit a customized graphical user interface (‘GUI’). Although our simplified ‘GUI’ language depends heavily on the R interface to the ‘Tcl/Tk’ package, a user does not need to know ‘Tcl/Tk’. Examples illustrate models built with other R packages, including ‘PBSmapping’, ‘PBSddesolve’, and ‘BRugs’. A complete user’s guide ‘PBSmodelling-UG.pdf’ shows how to use this package effectively.
3053 Analysis of Spatial Data plotGoogleMaps Plot Spatial or Spatio-Temporal Data Over Google Maps Provides an interactive plot device for handling the geographic data for web browsers, designed for the automatic creation of web maps as a combination of users’ data and Google Maps layers.
3054 Analysis of Spatial Data plotKML Visualization of Spatial and Spatio-Temporal Objects in Google Earth Writes sp-class, spacetime-class, raster-class and similar spatial and spatio-temporal objects to KML following some basic cartographic rules.
3055 Analysis of Spatial Data postGIStools Tools for Interacting with ‘PostgreSQL’ / ‘PostGIS’ Databases Functions to convert geometry and ‘hstore’ data types from ‘PostgreSQL’ into standard R objects, as well as to simplify the import of R data frames (including spatial data frames) into ‘PostgreSQL’. Note: This package is deprecated. For new projects, we recommend using the ‘sf’ package to interface with geodatabases.
3056 Analysis of Spatial Data PReMiuM Dirichlet Process Bayesian Clustering, Profile Regression Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.
3057 Analysis of Spatial Data ProbitSpatial Probit with Spatial Dependence, SAR and SEM Models Binomial Spatial Probit models for big data.
3058 Analysis of Spatial Data qualmap Opinionated Approach for Digitizing Semi-Structured Qualitative GIS Data Provides a set of functions for taking qualitative GIS data, hand drawn on a map, and converting it to a simple features object. These tools are focused on data that are drawn on a map that contains some type of polygon features. For each area identified on the map, the id numbers of these polygons can be entered as vectors and transformed using qualmap.
3059 Analysis of Spatial Data quickmapr Quickly Map and Explore Spatial Data While analyzing geospatial data, easy visualization is often needed that allows for quick plotting, and simple, but easy interactivity. Additionally, visualizing geospatial data in projected coordinates is also desirable. The ‘quickmapr’ package provides a simple method to visualize ‘sp’, ‘sf’ (via coercion to ‘sp’), and ‘raster’ objects, allows for basic zooming, panning, identifying,labeling, selecting, and measuring spatial objects. Importantly, it does not require that the data be in geographic coordinates.
3060 Analysis of Spatial Data ramps Bayesian Geostatistical Modeling with RAMPS Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm designed to lower autocorrelation in MCMC samples. Package performance is tuned for large spatial datasets.
3061 Analysis of Spatial Data RandomFields (core) Simulation and Analysis of Random Fields Methods for the inference on and the simulation of Gaussian fields are provided, as well as methods for the simulation of extreme value random fields.
3062 Analysis of Spatial Data rangeMapper A Platform for the Study of Macro-Ecology of Life History Traits Tools for easy generation of (life-history) traits maps based on species range (extent-of-occurrence) maps.
3063 Analysis of Spatial Data RArcInfo Functions to import data from Arc/Info V7.x binary coverages This package uses the functions written by Daniel Morissette <danmo@videotron.ca>; to read geographical information in Arc/Info V 7.x format and E00 files to import the coverages into R variables.
3064 Analysis of Spatial Data raster (core) Geographic Data Analysis and Modeling Reading, writing, manipulating, analyzing and modeling of gridded spatial data. The package implements basic and high-level functions. Processing of very large files is supported.
3065 Analysis of Spatial Data rasterVis Visualization Methods for Raster Data Methods for enhanced visualization and interaction with raster data. It implements visualization methods for quantitative data and categorical data, both for univariate and multivariate rasters. It also provides methods to display spatiotemporal rasters, and vector fields. See the website for examples.
3066 Analysis of Spatial Data RColorBrewer (core) ColorBrewer Palettes Provides color schemes for maps (and other graphics) designed by Cynthia Brewer as described at http://colorbrewer2.org
3067 Analysis of Spatial Data rcosmo Cosmic Microwave Background Data Analysis Handling and Analysing Spherical, HEALPix and Cosmic Microwave Background data on a HEALPix grid.
3068 Analysis of Spatial Data recmap Compute the Rectangular Statistical Cartogram Provides an interface and a C++ implementation of the RecMap MP2 construction heuristic (see ‘citation(“recmap”)’ for details). This algorithm draws maps according to a given statistical value (e.g., election results, population or epidemiological data). The basic idea of the RecMap algorithm is that each map region (e.g., different countries) is represented by a rectangle. The area of each rectangle represents the statistical value given as input (maintain zero cartographic error). Documentation about RecMap is provided by a vignette included in this package.
3069 Analysis of Spatial Data regress Gaussian Linear Models with Linear Covariance Structure Functions to fit Gaussian linear model by maximising the residual log likelihood where the covariance structure can be written as a linear combination of known matrices. Can be used for multivariate models and random effects models. Easy straight forward manner to specify random effects models, including random interactions. Code now optimised to use Sherman Morrison Woodbury identities for matrix inversion in random effects models. We’ve added the ability to fit models using any kernel as well as a function to return the mean and covariance of random effects conditional on the data (BLUPs).
3070 Analysis of Spatial Data rgbif Interface to the Global ‘Biodiversity’ Information Facility API A programmatic interface to the Web Service methods provided by the Global Biodiversity Information Facility (‘GBIF’; <https://www.gbif.org/developer/summary>). ‘GBIF’ is a database of species occurrence records from sources all over the globe. ‘rgbif’ includes functions for searching for taxonomic names, retrieving information on data providers, getting species occurrence records, getting counts of occurrence records, and using the ‘GBIF’ tile map service to make ‘rasters’ summarizing huge amounts of data.
3071 Analysis of Spatial Data rgdal (core) Bindings for the ‘Geospatial’ Data Abstraction Library Provides bindings to the ‘Geospatial’ Data Abstraction Library (‘GDAL’) (>= 1.11.4 and <= 2.5.0) and access to projection/transformation operations from the ‘PROJ.4’ library. The ‘GDAL’ and ‘PROJ.4’ libraries are external to the package, and, when installing the package from source, must be correctly installed first. From ‘rgdal’ 1.4.1, provision is made for ‘PROJ6’ accommodation, with ‘PROJ6’ functionality to follow; from 1.4.1 ‘rgdal’ will build and function when ‘PROJ’ >= 6. Both ‘GDAL’ raster and ‘OGR’ vector map data can be imported into R, and ‘GDAL’ raster data and ‘OGR’ vector data exported. Use is made of classes defined in the ‘sp’ package. Windows and Mac Intel OS X binaries (including ‘GDAL’, ‘PROJ.4’ and ‘Expat’) are provided on ‘CRAN’.
3072 Analysis of Spatial Data rgeos (core) Interface to Geometry Engine - Open Source (‘GEOS’) Interface to Geometry Engine - Open Source (‘GEOS’) using the C ‘API’ for topology operations on geometries. The ‘GEOS’ library is external to the package, and, when installing the package from source, must be correctly installed first. Windows and Mac Intel OS X binaries are provided on ‘CRAN’.
3073 Analysis of Spatial Data RgoogleMaps Overlays on Static Maps Serves two purposes: (i) Provide a comfortable R interface to query the Google server for static maps, and (ii) Use the map as a background image to overlay plots within R. This requires proper coordinate scaling.
3074 Analysis of Spatial Data rgrass7 Interface Between GRASS 7 Geographical Information System and R Interpreted interface between ‘GRASS’ 7 geographical information system and R, based on starting R from within the ‘GRASS’ ‘GIS’ environment, or running free-standing R in a temporary ‘GRASS’ location; the package provides facilities for using all ‘GRASS’ commands from the R command line. This package may not be used for ‘GRASS’ 6, for which ‘spgrass6’ should be used.
3075 Analysis of Spatial Data rnaturalearth World Map Data from Natural Earth Facilitates mapping by making natural earth map data from <http://www.naturalearthdata.com/> more easily available to R users.
3076 Analysis of Spatial Data RNetCDF Interface to NetCDF Datasets An interface to the NetCDF file format designed by Unidata for efficient storage of array-oriented scientific data and descriptions. The R interface is closely based on the C API of the NetCDF library, and it includes calendar conversions from the Unidata UDUNITS library. The current implementation supports all operations on NetCDF datasets in classic and 64-bit offset file formats, and NetCDF4-classic format is supported for reading and modification of existing files.
3077 Analysis of Spatial Data rpostgis R Interface to a ‘PostGIS’ Database Provides an interface between R and ‘PostGIS’-enabled ‘PostgreSQL’ databases to transparently transfer spatial data. Both vector (points, lines, polygons) and raster data are supported in read and write modes. Also provides convenience functions to execute common procedures in ‘PostgreSQL/PostGIS’.
3078 Analysis of Spatial Data RPostgreSQL R Interface to the ‘PostgreSQL’ Database System Database interface and ‘PostgreSQL’ driver for ‘R’. This package provides a Database Interface ‘DBI’ compliant driver for ‘R’ to access ‘PostgreSQL’ database systems. In order to build and install this package from source, ‘PostgreSQL’ itself must be present your system to provide ‘PostgreSQL’ functionality via its libraries and header files. These files are provided as ‘postgresql-devel’ package under some Linux distributions. On ‘macOS’ and ‘Microsoft Windows’ system the attached ‘libpq’ library source will be used.
3079 Analysis of Spatial Data RPyGeo ArcGIS Geoprocessing via Python Provides access to ArcGIS geoprocessing tools by building an interface between R and the ArcPy Python side-package via the reticulate package.
3080 Analysis of Spatial Data RQGIS Integrating R with QGIS Establishes an interface between R and ‘QGIS’, i.e. it allows the user to access ‘QGIS’ functionalities from the R console. It achieves this by using the ‘QGIS’ Python API via the command line. Hence, RQGIS extends R’s statistical power by the incredible vast geo-functionality of ‘QGIS’ (including also ‘GDAL’, ‘SAGA’- and ‘GRASS’-GIS among other third-party providers). This in turn creates a powerful environment for advanced and innovative (geo-)statistical geocomputing. ‘QGIS’ is licensed under GPL version 2 or greater and is available from <http://www.qgis.org/en/site/>.
3081 Analysis of Spatial Data RSAGA SAGA Geoprocessing and Terrain Analysis Provides access to geocomputing and terrain analysis functions of the geographical information system (GIS) ‘SAGA’ (System for Automated Geoscientific Analyses) from within R by running the command line version of SAGA. This package furthermore provides several R functions for handling ASCII grids, including a flexible framework for applying local functions (including predict methods of fitted models) and focal functions to multiple grids. SAGA GIS is available under GPLv2 / LGPLv2 licence from <http://sourceforge.net/projects/saga-gis/>.
3082 Analysis of Spatial Data RSurvey Geographic Information System Application A geographic information system (GIS) graphical user interface (GUI) that provides data viewing, management, and analysis tools.
3083 Analysis of Spatial Data rtop Interpolation of Data with Variable Spatial Support Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.
3084 Analysis of Spatial Data rworldmap Mapping Global Data Enables mapping of country level and gridded user datasets.
3085 Analysis of Spatial Data rworldxtra Country boundaries at high resolution High resolution vector country boundaries derived from Natural Earth data, can be plotted in rworldmap.
3086 Analysis of Spatial Data S2sls Spatial Two Stage Least Squares Estimation Fit a spatial instrumental-variable regression by two-stage least squares.
3087 Analysis of Spatial Data seg Measuring Spatial Segregation Measuring spatial segregation. The methods implemented in this package include White’s P index (1983) <doi:10.1086/227768>, Morrill’s D(adj) (1991), Wong’s D(w) and D(s) (1993) <doi:10.1080/00420989320080551>, and Reardon and O’Sullivan’s set of spatial segregation measures (2004) <doi:10.1111/j.0081-1750.2004.00150.x>.
3088 Analysis of Spatial Data sf (core) Simple Features for R Support for simple features, a standardized way to encode spatial vector data. Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations.
3089 Analysis of Spatial Data sgeostat An Object-Oriented Framework for Geostatistical Modeling in S+ An Object-oriented Framework for Geostatistical Modeling in S+ containing functions for variogram estimation, variogram fitting and kriging as well as some plot functions. Written entirely in S, therefore works only for small data sets in acceptable computing time.
3090 Analysis of Spatial Data shapefiles Read and Write ESRI Shapefiles Functions to read and write ESRI shapefiles
3091 Analysis of Spatial Data shp2graph Convert a SpatialLinesDataFrame Object to an ‘igraph’-Class Object Functions for converting network data from a SpatialLinesDataFrame object to an ‘igraph’-Class object.
3092 Analysis of Spatial Data siplab Spatial Individual-Plant Modelling A platform for experimenting with spatially explicit individual-based vegetation models.
3093 Analysis of Spatial Data smacpod Statistical Methods for the Analysis of Case-Control Point Data Statistical methods for analyzing case-control point data. Methods include the ratio of kernel densities, the difference in K Functions, the spatial scan statistic, and q nearest neighbors of cases.
3094 Analysis of Spatial Data smerc Statistical Methods for Regional Counts Implements statistical methods for analyzing the counts of areal data, with a focus on the detection of spatial clusters and clustering.
3095 Analysis of Spatial Data sp (core) Classes and Methods for Spatial Data Classes and methods for spatial data; the classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc.
3096 Analysis of Spatial Data spacetime (core) Classes and Methods for Spatio-Temporal Data Classes and methods for spatio-temporal data, including space-time regular lattices, sparse lattices, irregular data, and trajectories; utility functions for plotting data as map sequences (lattice or animation) or multiple time series; methods for spatial and temporal selection and subsetting, as well as for spatial/temporal/spatio-temporal matching or aggregation, retrieving coordinates, print, summary, etc.
3097 Analysis of Spatial Data spacom Spatially Weighted Context Data for Multilevel Modelling Provides tools to construct and exploit spatially weighted context data. Spatial weights are derived by a Kernel function from a user-defined matrix of distances between contextual units. Spatial weights can then be applied either to precise contextual measures or to aggregate estimates based on micro-level survey data, to compute spatially weighted context data. Available aggregation functions include indicators of central tendency, dispersion, or inter-group variability, and take into account survey design weights. The package further allows combining the resulting spatially weighted context data with individual-level predictor and outcome variables, for the purposes of multilevel modelling. An ad hoc stratified bootstrap resampling procedure generates robust point estimates for multilevel regression coefficients and model fit indicators, and computes confidence intervals adjusted for measurement dependency and measurement error of aggregate estimates. As an additional feature, residual and explained spatial dependency can be estimated for the tested models.
3098 Analysis of Spatial Data spaMM Mixed-Effect Models, Particularly Spatial Models Inference based on mixed-effect models, including generalized linear mixed models with spatial correlations and models with non-Gaussian random effects (e.g., Beta). Both classical geostatistical models, and Markov random field models on irregular grids, can be fitted. Variation in residual variance (heteroscedasticity) can itself be represented by a generalized linear mixed model. Various approximations of likelihood or restricted likelihood are implemented, in particular h-likelihood (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) and Laplace approximation.
3099 Analysis of Spatial Data spanel Spatial Panel Data Models Fit the spatial panel data models: the fixed effects, random effects and between models.
3100 Analysis of Spatial Data sparr Spatial and Spatiotemporal Relative Risk Provides functions to estimate kernel-smoothed spatial and spatio-temporal densities and relative risk functions, and perform subsequent inference. Methodological details can be found in the accompanying tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>.
3101 Analysis of Spatial Data spatgraphs Graph Edge Computations for Spatial Point Patterns Graphs (or networks) and graph component calculations for spatial locations in 1D, 2D, 3D etc.
3102 Analysis of Spatial Data spatial Functions for Kriging and Point Pattern Analysis Functions for kriging and point pattern analysis.
3103 Analysis of Spatial Data spatial.tools R Functions for Working with Spatial Data Spatial functions meant to enhance the core functionality of the package “raster”, including a parallel processing engine for use with rasters.
3104 Analysis of Spatial Data spatialCovariance Computation of Spatial Covariance Matrices for Data on Rectangles Functions that compute the spatial covariance matrix for the matern and power classes of spatial models, for data that arise on rectangular units. This code can also be used for the change of support problem and for spatial data that arise on irregularly shaped regions like counties or zipcodes by laying a fine grid of rectangles and aggregating the integrals in a form of Riemann integration.
3105 Analysis of Spatial Data SpatialEpi Methods and Data for Spatial Epidemiology Methods and data for cluster detection and disease mapping.
3106 Analysis of Spatial Data SpatialExtremes Modelling Spatial Extremes Tools for the statistical modelling of spatial extremes using max-stable processes, copula or Bayesian hierarchical models. More precisely, this package allows (conditional) simulations from various parametric max-stable models, analysis of the extremal spatial dependence, the fitting of such processes using composite likelihoods or least square (simple max-stable processes only), model checking and selection and prediction. Other approaches (although not completely in agreement with the extreme value theory) are available such as the use of (spatial) copula and Bayesian hierarchical models assuming the so-called conditional assumptions. The latter approaches is handled through an (efficient) Gibbs sampler. Some key references: Davison et al. (2012) <doi:10.1214/11-STS376>, Padoan et al. (2010) <doi:10.1198/jasa.2009.tm08577>, Dombry et al. (2013) <doi:10.1093/biomet/ass067>.
3107 Analysis of Spatial Data SpatialPosition Spatial Position Models Computes spatial position models: Stewart potentials, Reilly catchment areas, Huff catchment areas.
3108 Analysis of Spatial Data spatialprobit Spatial Probit Models Bayesian Estimation of Spatial Probit and Tobit Models.
3109 Analysis of Spatial Data spatialreg (core) Spatial Regression Analysis A collection of all the estimation functions for spatial cross-sectional models (on lattice/areal data using spatial weights matrices) contained up to now in ‘spdep’, ‘sphet’ and ‘spse’. These model fitting functions include maximum likelihood methods for cross-sectional models proposed by ‘Cliff’ and ‘Ord’ (1973, ISBN:0850860369) and (1981, ISBN:0850860814), fitting methods initially described by ‘Ord’ (1975) <doi:10.1080/01621459.1975.10480272>. The models are further described by ‘Anselin’ (1988) <doi:10.1007/978-94-015-7799-1>. Spatial two stage least squares and spatial general method of moment models initially proposed by ‘Kelejian’ and ‘Prucha’ (1998) <doi:10.1023/A:1007707430416> and (1999) <doi:10.1111/1468-2354.00027> are provided. Impact methods and MCMC fitting methods proposed by ‘LeSage’ and ‘Pace’ (2009) <doi:10.1201/9781420064254> are implemented for the family of cross-sectional spatial regression models. Methods for fitting the log determinant term in maximum likelihood and MCMC fitting are compared by ‘Bivand et al.’ (2013) <doi:10.1111/gean.12008>, and model fitting methods by ‘Bivand’ and ‘Piras’ (2015) <doi:10.18637/jss.v063.i18>; both of these articles include extensive lists of references. ‘spatialreg’ >= 1.1-* correspond to ‘spdep’ >= 1.1-1, in which the model fitting functions are deprecated and pass through to ‘spatialreg’, but will mask those in ‘spatialreg’. From versions 1.2-*, the functions will be made defunct in ‘spdep’.
3110 Analysis of Spatial Data spatialsegregation Segregation Measures for Multitype Spatial Point Patterns Summaries for measuring segregation/mingling in multitype spatial point patterns with graph based neighbourhood description. Included indices: Mingling, Shannon, Simpson (also the non-spatial) Included functionals: Mingling, Shannon, Simpson, ISAR, MCI. Included neighbourhoods: Geometric, k- nearest neighbours, Gabriel, Delaunay. Dixon’s test.
3111 Analysis of Spatial Data SpatialTools Tools for Spatial Data Analysis Tools for spatial data analysis. Emphasis on kriging. Provides functions for prediction and simulation. Intended to be relatively straightforward, fast, and flexible.
3112 Analysis of Spatial Data spatstat (core) Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests Comprehensive open-source toolbox for analysing Spatial Point Patterns. Focused mainly on two-dimensional point patterns, including multitype/marked points, in any spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, point patterns on a linear network, and patterns of other geometrical objects. Supports spatial covariate data such as pixel images. Contains over 2000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the functions ppm(), kppm(), slrm(), dppm() similar to glm(). Types of models include Poisson, Gibbs and Cox point processes, Neyman-Scott cluster processes, and determinantal point processes. Models may involve dependence on covariates, inter-point interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. A model can be fitted to a list of point patterns (replicated point pattern data) using the function mppm(). The model can include random effects and fixed effects depending on the experimental design, in addition to all the features listed above. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.
3113 Analysis of Spatial Data spatsurv Bayesian Spatial Survival Analysis with Parametric Proportional Hazards Models Bayesian inference for parametric proportional hazards spatial survival models; flexible spatial survival models.
3114 Analysis of Spatial Data spBayes Univariate and Multivariate Spatial-Temporal Modeling Fits univariate and multivariate spatio-temporal random effects models for point-referenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) <doi:10.18637/jss.v063.i13> and Finley, Banerjee, and Cook (2014) <doi:10.1111/2041-210X.12189>.
3115 Analysis of Spatial Data spBayesSurv Bayesian Modeling and Analysis of Spatially Correlated Survival Data Provides several Bayesian survival models for spatial/non-spatial survival data: proportional hazards (PH), accelerated failure time (AFT), proportional odds (PO), and accelerated hazards (AH), a super model that includes PH, AFT, PO and AH as special cases, Bayesian nonparametric nonproportional hazards (LDDPM), generalized accelerated failure time (GAFT), and spatially smoothed Polya tree density estimation. The spatial dependence is modeled via frailties under PH, AFT, PO, AH and GAFT, and via copulas under LDDPM and PH. Model choice is carried out via the logarithm of the pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC).
3116 Analysis of Spatial Data spcosa Spatial Coverage Sampling and Random Sampling from Compact Geographical Strata Spatial coverage sampling and random sampling from compact geographical strata created by k-means. See Walvoort et al. (2010) <doi:10.1016/j.cageo.2010.04.005> for details.
3117 Analysis of Spatial Data spdep (core) Spatial Dependence: Weighting Schemes, Statistics and Models A collection of functions to create spatial weights matrix objects from polygon ‘contiguities’, from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial ‘autocorrelation’, including global ‘Morans I’ and ‘Gearys C’ proposed by ‘Cliff’ and ‘Ord’ (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), ‘Hubert/Mantel’ general cross product statistic, Empirical Bayes estimates and ‘Assuncao/Reis’ (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, ‘Getis/Ord’ G (‘Getis’ and ‘Ord’ 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, ‘APLE’ (‘Li ’et al.’ ) <doi:10.1111/j.1538-4632.2007.00708.x>, local ‘Moran’s I’ (‘Anselin’ 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and ‘Getis/Ord’ G (‘Ord’ and ‘Getis’ 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, ‘saddlepoint’ approximations (‘Tiefelsdorf’ 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local ‘Moran’s I’ (‘Bivand et al.’ 2009) <doi:10.1016/j.csda.2008.07.021> and ‘LOSH’ local indicators of spatial heteroscedasticity (‘Ord’ and ‘Getis’) <doi:10.1007/s00168-011-0492-y>. The implementation of most of the measures is described in ‘Bivand’ and ‘Wong’ (2018) <doi:10.1007/s11749-018-0599-x>. ‘spdep’ >= 1.1-1 corresponds to ‘spatialreg’ >= 1.1-1, in which the model fitting functions are deprecated and pass through to ‘spatialreg’, but will mask those in ‘spatialreg’. From versions 1.2-1, the functions will be made defunct in ‘spdep’. For now ‘spatialreg’ only has functions from ‘spdep’, where they are shown as deprecated. ‘spatialreg’ only loads the namespace of ‘spdep’; if you attach ‘spdep’, the same functions in the other package will be masked. Some feed through adequately, others do not.
3118 Analysis of Spatial Data sperrorest Perform Spatial Error Estimation and Variable Importance in Parallel Implements spatial error estimation and permutation-based variable importance measures for predictive models using spatial cross-validation and spatial block bootstrap.
3119 Analysis of Spatial Data spgrass6 Interface Between GRASS 6+ Geographical Information System and R Interpreted interface between GRASS 6+ geographical information system and R, based on starting R from within the GRASS environment, or running free-standing R in a temporary GRASS location; the package provides facilities for using all GRASS commands from the R command line. This package may not be used for GRASS 7, for which rgrass7 should be used.
3120 Analysis of Spatial Data spgwr Geographically Weighted Regression Functions for computing geographically weighted regressions are provided, based on work by Chris Brunsdon, Martin Charlton and Stewart Fotheringham.
3121 Analysis of Spatial Data sphet Estimation of Spatial Autoregressive Models with and without Heteroscedasticity Generalized Method of Moment estimation of Cliff-Ord-type spatial autoregressive models with and without Heteroscedasticity.
3122 Analysis of Spatial Data spind Spatial Methods and Indices Functions for spatial methods based on generalized estimating equations (GEE) and wavelet-revised methods (WRM), functions for scaling by wavelet multiresolution regression (WMRR), conducting multi-model inference, and stepwise model selection. Further, contains functions for spatially corrected model accuracy measures.
3123 Analysis of Spatial Data splancs (core) Spatial and Space-Time Point Pattern Analysis The Splancs package was written as an enhancement to S-Plus for display and analysis of spatial point pattern data; it has been ported to R and is in “maintenance mode”.
3124 Analysis of Spatial Data splm Econometric Models for Spatial Panel Data ML and GM estimation and diagnostic testing of econometric models for spatial panel data.
3125 Analysis of Spatial Data spm Spatial Predictive Modeling Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.
3126 Analysis of Spatial Data spmoran Moran’s Eigenvector-Based Spatial Regression Models Functions for estimating Moran’s eigenvector-based spatial regression models. For details see Murakami (2018) <arXiv:1703.04467>.
3127 Analysis of Spatial Data spsann Optimization of Sample Configurations using Spatial Simulated Annealing Methods to optimize sample configurations using spatial simulated annealing. Multiple objective functions are implemented for various purposes, such as variogram estimation, spatial trend estimation and spatial interpolation. A general purpose spatial simulated annealing function enables the user to define his/her own objective function. Solutions for augmenting existing sample configurations and solving multi-objective optimization problems are available as well.
3128 Analysis of Spatial Data spselect Selecting Spatial Scale of Covariates in Regression Models Fits spatial scale (SS) forward stepwise regression, SS incremental forward stagewise regression, SS least angle regression (LARS), and SS lasso models. All area-level covariates are considered at all available scales to enter a model, but the SS algorithms are constrained to select each area-level covariate at a single spatial scale.
3129 Analysis of Spatial Data spsurvey Spatial Survey Design and Analysis These functions provide procedures for selecting sites for spatial surveys using spatially balanced algorithms applied to discrete points, linear networks, or polygons. The probability survey designs available include independent random samples, stratified random samples, and unequal probability random samples (categorical or probability proportional to size). Design-based estimation based on the results from surveys is available for estimating totals, means, quantiles, CDFs, and linear models. The analyses rely on package survey for most results. Variance estimation options include a local neighborhood variance estimator that is appropriate for spatially-balanced survey designs. A reference for the survey design portion of the package is: D. L. Stevens, Jr. and A. R. Olsen (2004), “Spatially-balanced sampling of natural resources.”, Journal of the American Statistical Association 99(465): 262-278, <doi:10.1198/016214504000000250>. Additional helpful references for this package are A. R. Olsen, T. M. Kincaid, and Q. Payton (2012) and T. M. Kincaid and A. R. Olsen (2012), both of which are chapters in the book “Design and Analysis of Long-Term Ecological Monitoring Studies” (R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.), Cambridge University Press, New York, <Online ISBN:9781139022422>).
3130 Analysis of Spatial Data spTimer Spatio-Temporal Bayesian Modelling Fits, spatially predicts and temporally forecasts large amounts of space-time data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian Auto-Regressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatio-temporal big-n problems. Bakar and Sahu (2015) <doi:10.18637/jss.v063.i15>.
3131 Analysis of Spatial Data SSN Spatial Modeling on Stream Networks Spatial statistical modeling and prediction for data on stream networks, including models based on in-stream distance (Ver Hoef, J.M. and Peterson, E.E., 2010. <doi:10.1198/jasa.2009.ap08248>.) Models are created using moving average constructions. Spatial linear models, including explanatory variables, can be fit with (restricted) maximum likelihood. Mapping and other graphical functions are included.
3132 Analysis of Spatial Data starma Modelling Space Time AutoRegressive Moving Average (STARMA) Processes Statistical functions to identify, estimate and diagnose a Space-Time AutoRegressive Moving Average (STARMA) model.
3133 Analysis of Spatial Data stars Spatiotemporal Arrays, Raster and Vector Data Cubes Reading, manipulating, writing and plotting spatiotemporal arrays (raster and vector data cubes) in ‘R’, using ‘GDAL’ bindings provided by ‘sf’, and ‘NetCDF’ bindings by ‘ncmeta’ and ‘RNetCDF’.
3134 Analysis of Spatial Data statebins U.S. State Cartogram Heatmaps in R; an Alternative to Choropleth Maps for USA States Cartogram heatmaps are an alternative to choropleth maps for USA States and are based on work by the Washington Post graphics department in their report on “The states most threatened by trade”. “State bins” preserve as much of the geographic placement of the states as possible but has the look and feel of a traditional heatmap. Functions are provided that allow for use of a binned, discrete scale, a continuous scale or manually specified colors depending on what is needed for the underlying data.
3135 Analysis of Spatial Data Stem Spatio-temporal models in R Estimation of the parameters of a spatio-temporal model using the EM algorithm, estimation of the parameter standard errors using a spatio-temporal parametric bootstrap, spatial mapping.
3136 Analysis of Spatial Data stplanr Sustainable Transport Planning Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. Enables common transport planning tasks including: downloading and cleaning transport datasets; creating geographic “desire lines” from origin-destination (OD) data; route assignment, locally and via interfaces to routing services such as <http://cyclestreets.net/>; calculation of route segment attributes such as bearing and aggregate flow; and ‘travel watershed’ analysis. See Lovelace and Ellison (2018) <doi:10.32614/RJ-2018-053>.
3137 Analysis of Spatial Data taRifx Collection of Utility and Convenience Functions A collection of various utility and convenience functions.
3138 Analysis of Spatial Data tgp Bayesian Treed Gaussian Process Models Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions.
3139 Analysis of Spatial Data tidycensus Load US Census Boundary and Attribute Data as ‘tidyverse’ and ‘sf’-Ready Data Frames An integrated R interface to the decennial US Census and American Community Survey APIs and the US Census Bureau’s geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for many geographies.
3140 Analysis of Spatial Data tigris Load Census TIGER/Line Shapefiles Download TIGER/Line shapefiles from the United States Census Bureau (<https://www.census.gov/geo/maps-data/data/tiger-line.html>) and load into R as ‘SpatialDataFrame’ or ‘sf’ objects.
3141 Analysis of Spatial Data tmap Thematic Maps Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.
3142 Analysis of Spatial Data trip Tools for the Analysis of Animal Track Data Functions for accessing and manipulating spatial data for animal tracking, with straightforward coercion from and to other formats. Filter for speed and create time spent maps from animal track data. There are coercion methods to convert between ‘trip’ and ‘ltraj’ from ‘adehabitatLT’, and between ‘trip’ and ‘psp’ and ‘ppp’ from ‘spatstat’. Trip objects can be created from raw or grouped data frames, and from types in the ‘sp’, ‘sf’, ‘amt’, ‘trackeR’, ‘mousetrap’, and other packages.
3143 Analysis of Spatial Data tripack Triangulation of Irregularly Spaced Data A constrained two-dimensional Delaunay triangulation package providing both triangulation and generation of voronoi mosaics of irregular spaced data.
3144 Analysis of Spatial Data tripEstimation Metropolis Sampler and Supporting Functions for Estimating Animal Movement from Archival Tags and Satellite Fixes Data handling and estimation functions for animal movement estimation from archival or satellite tags. Helper functions are included for making image summaries binned by time interval from Markov Chain Monte Carlo simulations.
3145 Analysis of Spatial Data UScensus2000cdp US Census 2000 Designated Places Shapefiles and Additional Demographic Data US Census 2000 Designated Places shapefiles and additional demographic data from the SF1 100 percent files. This data set contains polygon files in lat/lon coordinates and the corresponding demographic data for a number of different variables.
3146 Analysis of Spatial Data UScensus2000tract US Census 2000 Tract Level Shapefiles and Additional Demographic Data US 2000 Census Tract shapefiles and additional demographic data from the SF1 100 percent files. This data set contains polygon files in lat/lon coordinates and the corresponding demographic data for a number of different variables.
3147 Analysis of Spatial Data vardiag Variogram Diagnostics Interactive variogram diagnostics.
3148 Analysis of Spatial Data vec2dtransf 2D Cartesian Coordinate Transformation A package for applying affine and similarity transformations on vector spatial data (sp objects). Transformations can be defined from control points or directly from parameters. If redundant control points are provided Least Squares is applied allowing to obtain residuals and RMSE.
3149 Analysis of Spatial Data vegan Community Ecology Package Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
3150 Analysis of Spatial Data viridis Default Color Maps from ‘matplotlib’ Implementation of the ‘viridis’ - the default -, ‘magma’, ‘plasma’, ‘inferno’, and ‘cividis’ color maps for ‘R’. ‘viridis’, ‘magma’, ‘plasma’, and ‘inferno’ are ported from ‘matplotlib’ <http://matplotlib.org/>, a popular plotting library for ‘python’. ‘cividis’, was developed by Jamie R. Nunez and Sean M. Colby. These color maps are designed in such a way that they will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. They are also designed to be perceived by readers with the most common form of color blindness (all color maps in this package) and color vision deficiency (‘cividis’ only).
3151 Analysis of Spatial Data Watersheds Spatial Watershed Aggregation and Spatial Drainage Network Analysis Methods for watersheds aggregation and spatial drainage network analysis.
3152 Analysis of Spatial Data wkb Convert Between Spatial Objects and Well-Known Binary Geometry Utility functions to convert between the ‘Spatial’ classes specified by the package ‘sp’, and the well-known binary ‘(WKB)’ representation for geometry specified by the Open Geospatial Consortium. Supports ‘Spatial’ objects of class ‘SpatialPoints’, ‘SpatialPointsDataFrame’, ‘SpatialLines’, ‘SpatialLinesDataFrame’, ‘SpatialPolygons’, and ‘SpatialPolygonsDataFrame’. Supports ‘WKB’ geometry types ‘Point’, ‘LineString’, ‘Polygon’, ‘MultiPoint’, ‘MultiLineString’, and ‘MultiPolygon’. Includes extensions to enable creation of maps with ‘TIBCO Spotfire’.
3153 Handling and Analyzing Spatio-Temporal Data adehabitatLT (core) Analysis of Animal Movements A collection of tools for the analysis of animal movements.
3154 Handling and Analyzing Spatio-Temporal Data animalTrack Animal track reconstruction for high frequency 2-dimensional (2D) or 3-dimensional (3D) movement data 2D and 3D animal tracking data can be used to reconstruct tracks through time/space with correction based on known positions. 3D visualization of animal position and attitude.
3155 Handling and Analyzing Spatio-Temporal Data argosfilter Argos locations filter Functions to filters animal satellite tracking data obtained from Argos. It is especially indicated for telemetry studies of marine animals, where Argos locations are predominantly of low-quality.
3156 Handling and Analyzing Spatio-Temporal Data BayesianAnimalTracker Bayesian Melding of GPS and DR Path for Animal Tracking Bayesian melding approach to combine the GPS observations and Dead-Reckoned path for an accurate animal’s track, or equivalently, use the GPS observations to correct the Dead-Reckoned path. It can take the measurement errors in the GPS observations into account and provide uncertainty statement about the corrected path. The main calculation can be done by the BMAnimalTrack function.
3157 Handling and Analyzing Spatio-Temporal Data BBMM Brownian bridge movement model The model provides an empirical estimate of a movement path using discrete location data obtained at relatively short time intervals.
3158 Handling and Analyzing Spatio-Temporal Data bcpa Behavioral change point analysis of animal movement The Behavioral Change Point Analysis (BCPA) is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled. The method is based on: E. Gurarie, R. Andrews and K. Laidre A novel method for identifying behavioural changes in animal movement data (2009) Ecology Letters 12:5 395-408.
3159 Handling and Analyzing Spatio-Temporal Data CARBayesST Spatio-Temporal Generalised Linear Mixed Models for Areal Unit Data Implements a class of spatio-temporal generalised linear mixed models for areal unit data, with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation. The response variable can be binomial, Gaussian, or Poisson, but for some models only the binomial and Poisson data likelihoods are available. The spatio-temporal autocorrelation is modelled by random effects, which are assigned conditional autoregressive (CAR) style prior distributions. A number of different random effects structures are available, including Bernardinelli et al. (1995) <doi:10.1002/sim.4780142112>, Rushworth et al. (2014) <doi:10.1016/j.sste.2014.05.001> and Lee et al. (2016) <doi:10.1214/16-AOAS941>. Full details are given in the vignette accompanying this package. The creation of this package was supported by the Engineering and Physical Sciences Research Council (EPSRC) grant EP/J017442/1 and the Medical Research Council (MRC) grant MR/L022184/1.
3160 Handling and Analyzing Spatio-Temporal Data crawl Fit Continuous-Time Correlated Random Walk Models to Animal Movement Data Fit continuous-time correlated random walk models with time indexed covariates to animal telemetry data. The model is fit using the Kalman-filter on a state space version of the continuous-time stochastic movement process.
3161 Handling and Analyzing Spatio-Temporal Data cshapes The CShapes Dataset and Utilities Package for CShapes, a GIS dataset of country boundaries (1946-today). Includes functions for data extraction and the computation of distance matrices and -lists.
3162 Handling and Analyzing Spatio-Temporal Data ctmcmove Modeling Animal Movement with Continuous-Time Discrete-Space Markov Chains Software to facilitates taking movement data in xyt format and pairing it with raster covariates within a continuous time Markov chain (CTMC) framework. As described in Hanks et al. (2015) <doi:10.1214/14-AOAS803> , this allows flexible modeling of movement in response to covariates (or covariate gradients) with model fitting possible within a Poisson GLM framework.
3163 Handling and Analyzing Spatio-Temporal Data ctmm Continuous-Time Movement Modeling Functions for identifying, fitting, and applying continuous-space, continuous-time stochastic movement models to animal tracking data. The package is described in Calabrese et al (2016) <doi:10.1111/2041-210X.12559> and its methods are based on those introduced in Fleming & Calabrese et al (2014) <doi:10.1086/675504>, Fleming et al (2014) <doi:10.1111/2041-210X.12176>, Fleming et al (2015) <doi:10.1890/14-2010.1>, Fleming et al (2016) <doi:10.1890/15-1607>, Peron & Fleming et al (2016) <doi:10.1186/s40462-016-0084-7>, Fleming & Calabrese (2016) <doi:10.1111/2041-210X.12673>, Peron et al (2017) <doi:10.1002/ecm.1260>, Fleming et al (2017) <doi:10.1016/j.ecoinf.2017.04.008>, Fleming et al (2018) <doi:10.1002/eap.1704>, and Winner & Noonan et al (2018) <doi:10.1111/2041-210X.13027>.
3164 Handling and Analyzing Spatio-Temporal Data diveMove Dive Analysis and Calibration Utilities to represent, visualize, filter, analyse, and summarize time-depth recorder (TDR) data. Miscellaneous functions for handling location data are also provided.
3165 Handling and Analyzing Spatio-Temporal Data fishmove Prediction of Fish Movement Parameters Functions to predict fish movement parameters plotting leptokurtic fish dispersal kernels (see Radinger and Wolter, 2014: Patterns and predictors of fish dispersal in rivers. Fish and Fisheries. 15:456-473.)
3166 Handling and Analyzing Spatio-Temporal Data FLightR A Package for Reconstructing Animal Paths from Solar Geolocation Loggers Spatio-temporal locations of an animal are computed from annotated data with a hidden Markov model via particle filter algorithm. The package is relatively robust to varying degrees of shading. The hidden Markov model is described in Movement Ecology (Rakhimberdiev et al., 2015)<doi:10.1186/s40462-015-0062-5>, general package description is in the Methods in Ecology and Evolution (Rakhimberdiev et al., 2017)<doi:10.1111/2041-210X.12765> and package accuracy assessed in the Journal of Avian Biology (Rakhimberdiev et al. 2016)<doi:10.1111/jav.00891>.
3167 Handling and Analyzing Spatio-Temporal Data gapfill Fill Missing Values in Satellite Data Tools to fill missing values in satellite data and to develop new gap-fill algorithms. The methods are tailored to data (images) observed at equally-spaced points in time. The package is illustrated with MODIS NDVI data.
3168 Handling and Analyzing Spatio-Temporal Data GeoLight Analysis of Light Based Geolocator Data Provides basic functions for global positioning based on light intensity measurements over time. Positioning process includes the determination of sun events, a discrimination of residency and movement periods, the calibration of period-specific data and, finally, the calculation of positions.
3169 Handling and Analyzing Spatio-Temporal Data googleVis R Interface to Google Charts R interface to Google’s chart tools, allowing users to create interactive charts based on data frames. Charts are displayed locally via the R HTTP help server. A modern browser with an Internet connection is required and for some charts a Flash player. The data remains local and is not uploaded to Google.
3170 Handling and Analyzing Spatio-Temporal Data gstat (core) Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation Variogram modelling; simple, ordinary and universal point or block (co)kriging; spatio-temporal kriging; sequential Gaussian or indicator (co)simulation; variogram and variogram map plotting utility functions; supports sf and stars.
3171 Handling and Analyzing Spatio-Temporal Data IDE Integro-Difference Equation Spatio-Temporal Models The Integro-Difference Equation model is a linear, dynamical model used to model phenomena that evolve in space and in time; see, for example, Cressie and Wikle (2011, ISBN:978-0-471-69274-4) or Dewar et al. (2009) <doi:10.1109/TSP.2008.2005091>. At the heart of the model is the kernel, which dictates how the process evolves from one time point to the next. Both process and parameter reduction are used to facilitate computation, and spatially-varying kernels are allowed. Data used to estimate the parameters are assumed to be readings of the process corrupted by Gaussian measurement error. Parameters are fitted by maximum likelihood, and estimation is carried out using an evolution algorithm.
3172 Handling and Analyzing Spatio-Temporal Data lgcp Log-Gaussian Cox Process Spatial and spatio-temporal modelling of point patterns using the log-Gaussian Cox process. Bayesian inference for spatial, spatiotemporal, multivariate and aggregated point processes using Markov chain Monte Carlo.
3173 Handling and Analyzing Spatio-Temporal Data lme4 Linear Mixed-Effects Models using ‘Eigen’ and S4 Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.
3174 Handling and Analyzing Spatio-Temporal Data M3 Reading M3 files This package contains functions to read in and manipulate air quality model output from Models3-formatted files. This format is used by the Community Multiscale Air Quaility (CMAQ) model.
3175 Handling and Analyzing Spatio-Temporal Data mkde 2D and 3D movement-based kernel density estimates (MKDEs) Provides functions to compute and visualize movement-based kernel density estimates (MKDEs) for animal utilization distributions in 2 or 3 spatial dimensions.
3176 Handling and Analyzing Spatio-Temporal Data move Visualizing and Analyzing Animal Track Data Contains functions to access movement data stored in ‘movebank.org’ as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.
3177 Handling and Analyzing Spatio-Temporal Data moveHMM Animal Movement Modelling using Hidden Markov Models Provides tools for animal movement modelling using hidden Markov models. These include processing of tracking data, fitting hidden Markov models to movement data, visualization of data and fitted model, decoding of the state process…
3178 Handling and Analyzing Spatio-Temporal Data mvtsplot Multivariate Time Series Plot A function for plotting multivariate time series data
3179 Handling and Analyzing Spatio-Temporal Data ncdf4 Interface to Unidata netCDF (Version 4 or Earlier) Format Data Files Provides a high-level R interface to data files written using Unidata’s netCDF library (version 4 or earlier), which are binary data files that are portable across platforms and include metadata information in addition to the data sets. Using this package, netCDF files (either version 4 or “classic” version 3) can be opened and data sets read in easily. It is also easy to create new netCDF dimensions, variables, and files, in either version 3 or 4 format, and manipulate existing netCDF files. This package replaces the former ncdf package, which only worked with netcdf version 3 files. For various reasons the names of the functions have had to be changed from the names in the ncdf package. The old ncdf package is still available at the URL given below, if you need to have backward compatibility. It should be possible to have both the ncdf and ncdf4 packages installed simultaneously without a problem. However, the ncdf package does not provide an interface for netcdf version 4 files.
3180 Handling and Analyzing Spatio-Temporal Data nlme Linear and Nonlinear Mixed Effects Models Fit and compare Gaussian linear and nonlinear mixed-effects models.
3181 Handling and Analyzing Spatio-Temporal Data openair Tools for the Analysis of Air Pollution Data Tools to analyse, interpret and understand air pollution data. Data are typically hourly time series and both monitoring data and dispersion model output can be analysed. Many functions can also be applied to other data, including meteorological and traffic data.
3182 Handling and Analyzing Spatio-Temporal Data pastecs Package for Analysis of Space-Time Ecological Series Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>;) initiative to bring PASSTEC 2000 functionalities to R.
3183 Handling and Analyzing Spatio-Temporal Data pbdNCDF4 Programming with Big Data Interface to Parallel Unidata NetCDF4 Format Data Files This package adds collective parallel read and write capability to the R package ncdf4 version 1.8. Typical use is as a parallel NetCDF4 file reader in SPMD style programming. Each R process reads and writes its own data in a synchronized collective mode, resulting in faster parallel performance. Performance improvement is conditional on a parallel file system.
3184 Handling and Analyzing Spatio-Temporal Data plm Linear Models for Panel Data A set of estimators and tests for panel data econometrics.
3185 Handling and Analyzing Spatio-Temporal Data plotKML Visualization of Spatial and Spatio-Temporal Objects in Google Earth Writes sp-class, spacetime-class, raster-class and similar spatial and spatio-temporal objects to KML following some basic cartographic rules.
3186 Handling and Analyzing Spatio-Temporal Data RandomFields (core) Simulation and Analysis of Random Fields Methods for the inference on and the simulation of Gaussian fields are provided, as well as methods for the simulation of extreme value random fields.
3187 Handling and Analyzing Spatio-Temporal Data raster (core) Geographic Data Analysis and Modeling Reading, writing, manipulating, analyzing and modeling of gridded spatial data. The package implements basic and high-level functions. Processing of very large files is supported.
3188 Handling and Analyzing Spatio-Temporal Data rasterVis Visualization Methods for Raster Data Methods for enhanced visualization and interaction with raster data. It implements visualization methods for quantitative data and categorical data, both for univariate and multivariate rasters. It also provides methods to display spatiotemporal rasters, and vector fields. See the website for examples.
3189 Handling and Analyzing Spatio-Temporal Data rgl 3D Visualization Using OpenGL Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.
3190 Handling and Analyzing Spatio-Temporal Data rmatio Read and Write ‘Matlab’ Files Read and write ‘Matlab’ MAT files from R. The ‘rmatio’ package supports reading MAT version 4, MAT version 5 and MAT compressed version 5. The ‘rmatio’ package can write version 5 MAT files and version 5 files with variable compression.
3191 Handling and Analyzing Spatio-Temporal Data RNetCDF Interface to NetCDF Datasets An interface to the NetCDF file format designed by Unidata for efficient storage of array-oriented scientific data and descriptions. The R interface is closely based on the C API of the NetCDF library, and it includes calendar conversions from the Unidata UDUNITS library. The current implementation supports all operations on NetCDF datasets in classic and 64-bit offset file formats, and NetCDF4-classic format is supported for reading and modification of existing files.
3192 Handling and Analyzing Spatio-Temporal Data rsatscan Tools, Classes, and Methods for Interfacing with SaTScan Stand-Alone Software SaTScan(TM) (http://www.satscan.org) is software for finding regions in Time, Space, or Time-Space that have excess risk, based on scan statistics, and uses Monte Carlo hypothesis testing to generate P-values for these regions. The rsatscan package provides functions for writing R data frames in SaTScan-readable formats, for setting SaTScan parameters, for running SaTScan in the OS, and for reading the files that SaTScan creates.
3193 Handling and Analyzing Spatio-Temporal Data sf Simple Features for R Support for simple features, a standardized way to encode spatial vector data. Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations.
3194 Handling and Analyzing Spatio-Temporal Data SimilarityMeasures Trajectory Similarity Measures Functions to run and assist four different similarity measures. The similarity measures included are: longest common subsequence (LCSS), Frechet distance, edit distance and dynamic time warping (DTW). Each of these similarity measures can be calculated from two n-dimensional trajectories, both in matrix form.
3195 Handling and Analyzing Spatio-Temporal Data smam Statistical Modeling of Animal Movements Animal movement models including moving-resting process with embedded Brownian motion according to Yan et al. (2014) <doi:10.1007/s10144-013-0428-8>, Pozdnyakov et al. (2017) <doi:10.1007/s11009-017-9547-6>, Brownian motion with measurement error according to Pozdnyakov et al. (2014) <doi:10.1890/13-0532.1>, and moving-resting-handling process with embedded Brownian motion, Pozdnyakov et al. (2018) <arXiv:1806.00849>.
3196 Handling and Analyzing Spatio-Temporal Data solaR Radiation and Photovoltaic Systems Calculation methods of solar radiation and performance of photovoltaic systems from daily and intradaily irradiation data sources.
3197 Handling and Analyzing Spatio-Temporal Data sp (core) Classes and Methods for Spatial Data Classes and methods for spatial data; the classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc.
3198 Handling and Analyzing Spatio-Temporal Data spacetime (core) Classes and Methods for Spatio-Temporal Data Classes and methods for spatio-temporal data, including space-time regular lattices, sparse lattices, irregular data, and trajectories; utility functions for plotting data as map sequences (lattice or animation) or multiple time series; methods for spatial and temporal selection and subsetting, as well as for spatial/temporal/spatio-temporal matching or aggregation, retrieving coordinates, print, summary, etc.
3199 Handling and Analyzing Spatio-Temporal Data spate Spatio-Temporal Modeling of Large Data Using a Spectral SPDE Approach Functionality for spatio-temporal modeling of large data sets is provided. A Gaussian process in space and time is defined through a stochastic partial differential equation (SPDE). The SPDE is solved in the spectral space, and after discretizing in time and space, a linear Gaussian state space model is obtained. When doing inference, the main computational difficulty consists in evaluating the likelihood and in sampling from the full conditional of the spectral coefficients, or equivalently, the latent space-time process. In comparison to the traditional approach of using a spatio-temporal covariance function, the spectral SPDE approach is computationally advantageous. This package aims at providing tools for two different modeling approaches. First, the SPDE based spatio-temporal model can be used as a component in a customized hierarchical Bayesian model (HBM). The functions of the package then provide parameterizations of the process part of the model as well as computationally efficient algorithms needed for doing inference with the HBM. Alternatively, the adaptive MCMC algorithm implemented in the package can be used as an algorithm for doing inference without any additional modeling. The MCMC algorithm supports data that follow a Gaussian or a censored distribution with point mass at zero. Covariates can be included in the model through a regression term.
3200 Handling and Analyzing Spatio-Temporal Data SpatioTemporal Spatio-Temporal Model Estimation Utilities that estimate, predict and cross-validate the spatio-temporal model developed for MESA Air.
3201 Handling and Analyzing Spatio-Temporal Data spatstat Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests Comprehensive open-source toolbox for analysing Spatial Point Patterns. Focused mainly on two-dimensional point patterns, including multitype/marked points, in any spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, point patterns on a linear network, and patterns of other geometrical objects. Supports spatial covariate data such as pixel images. Contains over 2000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the functions ppm(), kppm(), slrm(), dppm() similar to glm(). Types of models include Poisson, Gibbs and Cox point processes, Neyman-Scott cluster processes, and determinantal point processes. Models may involve dependence on covariates, inter-point interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. A model can be fitted to a list of point patterns (replicated point pattern data) using the function mppm(). The model can include random effects and fixed effects depending on the experimental design, in addition to all the features listed above. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.
3202 Handling and Analyzing Spatio-Temporal Data spBayes Univariate and Multivariate Spatial-Temporal Modeling Fits univariate and multivariate spatio-temporal random effects models for point-referenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) <doi:10.18637/jss.v063.i13> and Finley, Banerjee, and Cook (2014) <doi:10.1111/2041-210X.12189>.
3203 Handling and Analyzing Spatio-Temporal Data sphet Estimation of Spatial Autoregressive Models with and without Heteroscedasticity Generalized Method of Moment estimation of Cliff-Ord-type spatial autoregressive models with and without Heteroscedasticity.
3204 Handling and Analyzing Spatio-Temporal Data splancs Spatial and Space-Time Point Pattern Analysis The Splancs package was written as an enhancement to S-Plus for display and analysis of spatial point pattern data; it has been ported to R and is in “maintenance mode”.
3205 Handling and Analyzing Spatio-Temporal Data splm Econometric Models for Spatial Panel Data ML and GM estimation and diagnostic testing of econometric models for spatial panel data.
3206 Handling and Analyzing Spatio-Temporal Data spTimer Spatio-Temporal Bayesian Modelling Fits, spatially predicts and temporally forecasts large amounts of space-time data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian Auto-Regressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatio-temporal big-n problems. Bakar and Sahu (2015) <doi:10.18637/jss.v063.i15>.
3207 Handling and Analyzing Spatio-Temporal Data stam Spatio-Temporal Analysis and Modelling stam is an evolving package that target on the various methods to conduct Spatio-Temporal Analysis and Modelling,including Exploratory Spatio-Temporal Analysis and Inferred Spatio-Temporal Modelling.
3208 Handling and Analyzing Spatio-Temporal Data Stem Spatio-temporal models in R Estimation of the parameters of a spatio-temporal model using the EM algorithm, estimation of the parameter standard errors using a spatio-temporal parametric bootstrap, spatial mapping.
3209 Handling and Analyzing Spatio-Temporal Data STMedianPolish Spatio-Temporal Median Polish Analyses spatio-temporal data, decomposing data in n-dimensional arrays and using the median polish technique.
3210 Handling and Analyzing Spatio-Temporal Data surveillance (core) Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hohle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hohle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.
3211 Handling and Analyzing Spatio-Temporal Data trackeR Infrastructure for Running, Cycling and Swimming Data from GPS-Enabled Tracking Devices Provides infrastructure for handling running, cycling and swimming data from GPS-enabled tracking devices within R. The package provides methods to extract, clean and organise workout and competition data into session-based and unit-aware data objects of class ‘trackeRdata’ (S3 class). The information can then be visualised, summarised, and analysed through flexible and extensible methods. Frick and Kosmidis (2017) <doi:10.18637/jss.v082.i07>, which is updated and maintained as one of the vignettes, provides detailed descriptions of the package and its methods, and real-data demonstrations of the package functionality.
3212 Handling and Analyzing Spatio-Temporal Data TrackReconstruction Reconstruct animal tracks from magnetometer, accelerometer, depth and optional speed data Reconstructs animal tracks from magnetometer, accelerometer, depth and optional speed data. Designed primarily using data from Wildlife Computers Daily Diary tags deployed on northern fur seals.
3213 Handling and Analyzing Spatio-Temporal Data trip (core) Tools for the Analysis of Animal Track Data Functions for accessing and manipulating spatial data for animal tracking, with straightforward coercion from and to other formats. Filter for speed and create time spent maps from animal track data. There are coercion methods to convert between ‘trip’ and ‘ltraj’ from ‘adehabitatLT’, and between ‘trip’ and ‘psp’ and ‘ppp’ from ‘spatstat’. Trip objects can be created from raw or grouped data frames, and from types in the ‘sp’, ‘sf’, ‘amt’, ‘trackeR’, ‘mousetrap’, and other packages.
3214 Handling and Analyzing Spatio-Temporal Data tripEstimation Metropolis Sampler and Supporting Functions for Estimating Animal Movement from Archival Tags and Satellite Fixes Data handling and estimation functions for animal movement estimation from archival or satellite tags. Helper functions are included for making image summaries binned by time interval from Markov Chain Monte Carlo simulations.
3215 Handling and Analyzing Spatio-Temporal Data VTrack A Collection of Tools for the Analysis of Remote Acoustic Telemetry Data Designed to facilitate the assimilation, analysis and synthesis of animal location and movement data collected by the VEMCO suite of acoustic transmitters and receivers. As well as database and geographic information capabilities the principal feature of VTrack is the qualification and identification of ecologically relevant events from the acoustic detection and sensor data. This procedure condenses the acoustic detection database by orders of magnitude, greatly enhancing the synthesis of acoustic detection data.
3216 Handling and Analyzing Spatio-Temporal Data wildlifeDI Calculate Indices of Dynamic Interaction for Wildlife Tracking Data Dynamic interaction refers to spatial-temporal associations in the movements of two (or more) animals. This package provides tools for calculating a suite of indices used for quantifying dynamic interaction with wildlife telemetry data. For more information on each of the methods employed see the references within. The package (as of version 0.3) also has new tools for automating contact analysis in large tracking datasets. The package draws heavily on the classes and methods developed in the ‘adehabitat’ packages.
3217 Handling and Analyzing Spatio-Temporal Data xts (core) eXtensible Time Series Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
3218 Survival Analysis AdapEnetClass A Class of Adaptive Elastic Net Methods for Censored Data Provides new approaches to variable selection for AFT model.
3219 Survival Analysis addhazard Fit Additive Hazards Models for Survival Analysis Contains tools to fit the additive hazards model to data from a cohort, random sampling, two-phase Bernoulli sampling and two-phase finite population sampling, as well as calibration tool to incorporate phase I auxiliary information into the two-phase data model fitting. This package provides regression parameter estimates and their model-based and robust standard errors. It also offers tools to make prediction of individual specific hazards.
3220 Survival Analysis AER Applied Econometrics with R Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York. ISBN 978-0-387-77316-2. (See the vignette “AER” for a package overview.)
3221 Survival Analysis ahaz Regularization for semiparametric additive hazards regression Computationally efficient procedures for regularized estimation with the semiparametric additive hazards regression model.
3222 Survival Analysis AHR Estimation and Testing of Average Hazard Ratios Methods for estimation of multivariate average hazard ratios as defined by Kalbfleisch and Prentice. The underlying survival functions of the event of interest in each group can be estimated using either the (weighted) Kaplan-Meier estimator or the Aalen-Johansen estimator for the transition probabilities in Markov multi-state models. Right-censored and left-truncated data is supported. Moreover, the difference in restricted mean survival can be estimated.
3223 Survival Analysis AIM AIM: adaptive index model R functions for adaptively constructing index models for continuous, binary and survival outcomes. Implementation requires loading R-pacakge “survival”
3224 Survival Analysis APtools Average Positive Predictive Values (AP) for Binary Outcomes and Censored Event Times We provide tools to estimate two prediction accuracy metrics, the average positive predictive values (AP) as well as the well-known AUC (the area under the receiver operator characteristic curve) for risk scores. The outcome of interest is either binary or censored event time. Note that for censored event time, our functions’ estimates, the AP and the AUC, are time-dependent for pre-specified time interval(s). A function that compares the APs of two risk scores/markers is also included. Optional outputs include positive predictive values and true positive fractions at the specified marker cut-off values, and a plot of the time-dependent AP versus time (available for event time data).
3225 Survival Analysis asaur Data Sets for “Applied Survival Analysis Using R”" Data sets are referred to in the text “Applied Survival Analysis Using R” by Dirk F. Moore, Springer, 2016, ISBN: 978-3-319-31243-9, <doi:10.1007/978-3-319-31245-3>.
3226 Survival Analysis asbio A Collection of Statistical Tools for Biologists Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.
3227 Survival Analysis aster Aster Models Aster models are exponential family regression models for life history analysis. They are like generalized linear models except that elements of the response vector can have different families (e. g., some Bernoulli, some Poisson, some zero-truncated Poisson, some normal) and can be dependent, the dependence indicated by a graphical structure. Discrete time survival analysis, zero-inflated Poisson regression, and generalized linear models that are exponential family (e. g., logistic regression and Poisson regression with log link) are special cases. Main use is for data in which there is survival over discrete time periods and there is additional data about what happens conditional on survival (e. g., number of offspring). Uses the exponential family canonical parameterization (aster transform of usual parameterization). There are also random effects versions of these models.
3228 Survival Analysis aster2 Aster Models Aster models are exponential family regression models for life history analysis. They are like generalized linear models except that elements of the response vector can have different families (e. g., some Bernoulli, some Poisson, some zero-truncated Poisson, some normal) and can be dependent, the dependence indicated by a graphical structure. Discrete time survival analysis, zero-inflated Poisson regression, and generalized linear models that are exponential family (e. g., logistic regression and Poisson regression with log link) are special cases. Main use is for data in which there is survival over discrete time periods and there is additional data about what happens conditional on survival (e. g., number of offspring). Uses the exponential family canonical parameterization (aster transform of usual parameterization). Unlike the aster package, this package does dependence groups (nodes of the graph need not be conditionally independent given their predecessor node), including multinomial and two-parameter normal as families. Thus this package also generalizes mark-capture-recapture analysis.
3229 Survival Analysis BaSTA Age-Specific Survival Analysis from Incomplete Capture-Recapture/Recovery Data Estimates survival and mortality with covariates from capture-recapture/recovery data in a Bayesian framework when many individuals are of unknown age. It includes tools for data checking, model diagnostics and outputs such as life-tables and plots.
3230 Survival Analysis BayesPiecewiseICAR Hierarchical Bayesian Model for a Hazard Function Fits a piecewise exponential hazard to survival data using a Hierarchical Bayesian model with an Intrinsic Conditional Autoregressive formulation for the spatial dependency in the hazard rates for each piece. This function uses Metropolis- Hastings-Green MCMC to allow the number of split points to vary. This function outputs graphics that display the histogram of the number of split points and the trace plots of the hierarchical parameters. The function outputs a list that contains the posterior samples for the number of split points, the location of the split points, and the log hazard rates corresponding to these splits. Additionally, this outputs the posterior samples of the two hierarchical parameters, Mu and Sigma^2.
3231 Survival Analysis bayesSurv Bayesian Survival Regression with Flexible Error and Random Effects Distributions Contains Bayesian implementations of Mixed-Effects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only right-censored but also interval-censored, doubly-interval-censored or misclassified interval-censored.
3232 Survival Analysis BayHaz R Functions for Bayesian Hazard Rate Estimation A suite of R functions for Bayesian estimation of smooth hazard rates via Compound Poisson Process (CPP) and Bayesian Penalized Spline (BPS) priors.
3233 Survival Analysis BGPhazard Markov Beta and Gamma Processes for Modeling Hazard Rates Computes the hazard rate estimate as described by Nieto-Barajas and Walker (2002) and Nieto-Barajas (2003).
3234 Survival Analysis Biograph Explore Life Histories Transition rates are computed from transitions and exposures.Useful graphics and life-course indicators are computed. The package structures the data for multistate statistical and demographic modeling of life histories.
3235 Survival Analysis BMA Bayesian Model Averaging Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).
3236 Survival Analysis bnnSurvival Bagged k-Nearest Neighbors Survival Prediction Implements a bootstrap aggregated (bagged) version of the k-nearest neighbors survival probability prediction method (Lowsky et al. 2013). In addition to the bootstrapping of training samples, the features can be subsampled in each baselearner to break the correlation between them. The Rcpp package is used to speed up the computation.
3237 Survival Analysis boot Bootstrap Functions (Originally by Angelo Canty for S) Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.
3238 Survival Analysis bpcp Beta Product Confidence Procedure for Right Censored Data Calculates nonparametric pointwise confidence intervals for the survival distribution for right censored data. Has two-sample tests for dissimilarity (e.g., difference, ratio or odds ratio) in survival at a fixed time. Especially important for small sample sizes or heavily censored data. Includes mid-p options.
3239 Survival Analysis bshazard Nonparametric Smoothing of the Hazard Function The function estimates the hazard function non parametrically from a survival object (possibly adjusted for covariates). The smoothed estimate is based on B-splines from the perspective of generalized linear mixed models. Left truncated and right censoring data are allowed.
3240 Survival Analysis bujar Buckley-James Regression for Survival Data with High-Dimensional Covariates Buckley-James regression for right-censoring survival data with high-dimensional covariates. Implementations for survival data include boosting with componentwise linear least squares, componentwise smoothing splines, regression trees and MARS. Other high-dimensional tools include penalized regression for survival data. See Wang and Wang (2010) <doi:10.2202/1544-6115.1550>.
3241 Survival Analysis casebase Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression Implements the case-base sampling approach of Hanley and Miettinen (2009) <doi:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <doi:10.1111/sjos.12125>, and Saarela (2015) <doi:10.1007/s10985-015-9352-x>, for fitting flexible hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. From the fitted hazard function, cumulative incidence, risk functions of time, treatment and profile can be derived. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots.
3242 Survival Analysis censReg Censored Regression (Tobit) Models Maximum Likelihood estimation of censored regression (Tobit) models with cross-sectional and panel data.
3243 Survival Analysis CFC Cause-Specific Framework for Competing-Risk Analysis Numerical integration of cause-specific survival curves to arrive at cause-specific cumulative incidence functions, with three usage modes: 1) Convenient API for parametric survival regression followed by competing-risk analysis, 2) API for CFC, accepting user-specified survival functions in R, and 3) Same as 2, but accepting survival functions in C++.
3244 Survival Analysis clinfun Clinical Trial Design and Data Analysis Functions Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.
3245 Survival Analysis cmprsk (core) Subdistribution Analysis of Competing Risks Estimation, testing and regression modeling of subdistribution functions in competing risks, as described in Gray (1988), A class of K-sample tests for comparing the cumulative incidence of a competing risk, Ann. Stat. 16:1141-1154, and Fine JP and Gray RJ (1999), A proportional hazards model for the subdistribution of a competing risk, JASA, 94:496-509.
3246 Survival Analysis cmprskQR Analysis of Competing Risks Using Quantile Regressions Estimation, testing and regression modeling of subdistribution functions in competing risks using quantile regressions, as described in Peng and Fine (2009) <doi:10.1198/jasa.2009.tm08228>.
3247 Survival Analysis coarseDataTools A Collection of Functions to Help with Analysis of Coarsely Observed Data Functions to analyze coarse data. Specifically, it contains functions to (1) fit parametric accelerated failure time models to interval-censored survival time data, and (2) estimate the case-fatality ratio in scenarios with under-reporting. This package’s development was motivated by applications to infectious disease: in particular, problems with estimating the incubation period and the case fatality ratio of a given disease. Sample data files are included in the package.
3248 Survival Analysis coin Conditional Inference Procedures in a Permutation Test Framework Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems.
3249 Survival Analysis compareC Compare Two Correlated C Indices with Right-censored Survival Outcome Proposed by Harrell, the C index or concordance C, is considered an overall measure of discrimination in survival analysis between a survival outcome that is possibly right censored and a predictive-score variable, which can represent a measured biomarker or a composite-score output from an algorithm that combines multiple biomarkers. This package aims to statistically compare two C indices with right-censored survival outcome, which commonly arise from a paired design and thus resulting two correlated C indices.
3250 Survival Analysis compeir Event-specific incidence rates for competing risks data The package enables to compute event-specific incidence rates for competing risks data, to compute rate ratios, event-specific incidence proportions and cumulative incidence functions from these, and to plot these in a comprehensive multi-state type graphic.
3251 Survival Analysis compound.Cox Univariate Feature Selection and Compound Covariate for Predicting Survival Univariate feature selection and compound covariate methods under the Cox model with high-dimensional features (e.g., gene expressions). Available are survival data for non-small-cell lung cancer patients with gene expressions (Chen et al 2007 New Engl J Med) <doi:10.1056/NEJMoa060096>, statistical methods in Emura et al (2012 PLoS ONE) <doi:10.1371/journal.pone.0047627>, Emura & Chen (2016 Stat Methods Med Res) <doi:10.1177/0962280214533378>, and Emura et al. (2019)<doi:10.1016/j.cmpb.2018.10.020>. Algorithms for generating correlated gene expressions are also available.
3252 Survival Analysis concreg Concordance Regression Implements concordance regression which can be used to estimate generalized odds of concordance. Can be used for non- and semi-parametric survival analysis with non-proportional hazards, for binary and for continuous outcome data.
3253 Survival Analysis condGEE Parameter estimation in conditional GEE for recurrent event gap times Solves for the mean parameters, the variance parameter, and their asymptotic variance in a conditional GEE for recurrent event gap times, as described by Clement and Strawderman (2009) in the journal Biostatistics. Makes a parametric assumption for the length of the censored gap time.
3254 Survival Analysis condSURV Estimation of the Conditional Survival Function for Ordered Multivariate Failure Time Data Method to implement some newly developed methods for the estimation of the conditional survival function.
3255 Survival Analysis controlTest Quantile Comparison for Two-Sample Right-Censored Survival Data Nonparametric two-sample procedure for comparing survival quantiles.
3256 Survival Analysis CoxBoost Cox models by likelihood based boosting for a single survival endpoint or competing risks This package provides routines for fitting Cox models by likelihood based boosting for a single endpoint or in presence of competing risks
3257 Survival Analysis coxinterval Cox-Type Models for Interval-Censored Data Fits Cox-type models based on interval-censored data from a survival or illness-death process.
3258 Survival Analysis coxme Mixed Effects Cox Models Cox proportional hazards models containing Gaussian random effects, also known as frailty models.
3259 Survival Analysis coxphf Cox Regression with Firth’s Penalized Likelihood Implements Firth’s penalized maximum likelihood bias reduction method for Cox regression which has been shown to provide a solution in case of monotone likelihood (nonconvergence of likelihood function). The program fits profile penalized likelihood confidence intervals which were proved to outperform Wald confidence intervals.
3260 Survival Analysis coxphw Weighted Estimation in Cox Regression Implements weighted estimation in Cox regression as proposed by Schemper, Wakounig and Heinze (Statistics in Medicine, 2009, <doi:10.1002/sim.3623>) and as described in Dunkler, Ploner, Schemper and Heinze (Journal of Statistical Software, 2018, <doi:10.18637/jss.v084.i02>). Weighted Cox regression provides unbiased average hazard ratio estimates also in case of non-proportional hazards. Approximated generalized concordance probability an effect size measure for clear-cut decisions can be obtained. The package provides options to estimate time-dependent effects conveniently by including interactions of covariates with arbitrary functions of time, with or without making use of the weighting option.
3261 Survival Analysis CoxRidge Cox Models with Dynamic Ridge Penalties A package for fitting Cox models with penalized ridge-type partial likelihood. The package includes functions for fitting simple Cox models with all covariates controlled by a ridge penalty. The weight of the penalty is optimised by using a REML type-algorithm. Models with time varying effects of the covariates can also be fitted. Some of the covariates may be allowed to be fixed and thus not controlled by the penalty. There are three different penalty functions, ridge, dynamic and weighted dynamic. Time varying effects can be fitted without the need of an expanded dataset.
3262 Survival Analysis coxrobust Robust Estimation in Cox Model Fit robustly proportional hazards regression model
3263 Survival Analysis coxsei Fitting a CoxSEI Model It fits a CoxSEI (Cox type Self-Exciting Intensity) model to right-censored counting process data.
3264 Survival Analysis CPE Concordance Probability Estimates in Survival Analysis Functions to calculate concordance probability estimates in survival analysis.
3265 Survival Analysis Cprob The Conditional Probability Function of a Competing Event Permits to estimate the conditional probability function of a competing event, and to fit, using the temporal process regression or the pseudo-value approach, a proportional-odds model to the conditional probability function (or other models by specifying another link function). See <doi:10.1111/j.1467-9876.2010.00729.x>.
3266 Survival Analysis CR Power Calculation for Weighted Log-Rank Tests in Cure Rate Models This package contains R-functions to perform power calculation in a group sequential clinical trial with censored survival data and possibly unequal patient allocation between treatment and control groups. The fuctions can also be used to determine the study duration in a clinical trial with censored survival data as the sum of the accrual duration, which determines the sample size in a traditional sense, and the follow-up duration, which more or less controls the number of events to be observed. This package also contains R functions and methods to display the computed results.
3267 Survival Analysis crrp Penalized Variable Selection in Competing Risks Regression In competing risks regression, the proportional subdistribution hazards(PSH) model is popular for its direct assessment of covariate effects on the cumulative incidence function. This package allows for penalized variable selection for the PSH model. Penalties include LASSO, SCAD, MCP, and their group versions.
3268 Survival Analysis crrSC Competing risks regression for Stratified and Clustered data Extension of cmprsk to Stratified and Clustered data. Goodness of fit test for Fine-Gray model.
3269 Survival Analysis crrstep Stepwise Covariate Selection for the Fine & Gray Competing Risks Regression Model Performs forward and backwards stepwise regression for the Proportional subdistribution hazards model in competing risks (Fine & Gray 1999). Procedure uses AIC, BIC and BICcr as selection criteria. BICcr has a penalty of k = log(n), where n is the number of primary events.
3270 Survival Analysis crskdiag Diagnostics for Fine and Gray Model Provides the implementation of analytical and graphical approaches for checking the assumptions of the Fine and Gray model.
3271 Survival Analysis currentSurvival Estimation of CCI and CLFS Functions The currentSurvival package contains functions for the estimation of the current cumulative incidence (CCI) and the current leukaemia-free survival (CLFS). The CCI is the probability that a patient is alive and in any disease remission (e.g. complete cytogenetic remission in chronic myeloid leukaemia) after initiating his or her therapy (e.g. tyrosine kinase therapy for chronic myeloid leukaemia). The CLFS is the probability that a patient is alive and in any disease remission after achieving the first disease remission.
3272 Survival Analysis Cyclops Cyclic Coordinate Descent for Logistic, Poisson and Survival Analysis This model fitting tool incorporates cyclic coordinate descent and majorization-minimization approaches to fit a variety of regression models found in large-scale observational healthcare data. Implementations focus on computational optimization and fine-scale parallelization to yield efficient inference in massive datasets. Please see: Suchard, Simpson, Zorych, Ryan and Madigan (2013) <doi:10.1145/2414416.2414791>.
3273 Survival Analysis DAAG Data Analysis and Graphics Data and Functions Various data sets used in examples and exercises in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) “Data Analysis and Graphics Using R”.
3274 Survival Analysis dblcens Compute the NPMLE of distribution from doubly censored data Use EM algorithm to compute the NPMLE of CDF and also the two censoring distributions. For doubly censored data (as described in Chang and Yang (1987) Ann. Stat. 1536-47). You can also specify a constraint, it will return the constrained NPMLE and the -2 log empirical likelihood ratio. This can be used to test the hypothesis about the constraint and find confidence intervals for probability or quantile via empirical likelihood ratio theorem. Influence function of hat F may also be calculated (but may be slow).
3275 Survival Analysis discSurv Discrete Time Survival Analysis Provides data transformations, estimation utilities, predictive evaluation measures and simulation functions for discrete time survival analysis.
3276 Survival Analysis DPpackage Bayesian Nonparametric Modeling in R Functions to perform inference via simulation from the posterior distributions for Bayesian nonparametric and semiparametric models. Although the name of the package was motivated by the Dirichlet Process prior, the package considers and will consider other priors on functional spaces. So far, DPpackage includes models considering Dirichlet Processes, Dependent Dirichlet Processes, Dependent Poisson- Dirichlet Processes, Hierarchical Dirichlet Processes, Polya Trees, Linear Dependent Tailfree Processes, Mixtures of Triangular distributions, Random Bernstein polynomials priors and Dependent Bernstein Polynomials. The package also includes models considering Penalized B-Splines. Includes semiparametric models for marginal and conditional density estimation, ROC curve analysis, interval censored data, binary regression models, generalized linear mixed models, IRT type models, and generalized additive models. Also contains functions to compute Pseudo-Bayes factors for model comparison, and to elicitate the precision parameter of the Dirichlet Process. To maximize computational efficiency, the actual sampling for each model is done in compiled FORTRAN. The functions return objects which can be subsequently analyzed with functions provided in the ‘coda’ package.
3277 Survival Analysis DStree Recursive Partitioning for Discrete-Time Survival Trees Building discrete-time survival trees and bagged trees based on the functionalities of the rpart package. Splitting criterion maximizes the likelihood of a covariate-free logistic discrete time hazard model.
3278 Survival Analysis DTDA Doubly truncated data analysis This package implements different algorithms for analyzing randomly truncated data, one-sided and two-sided (i.e. doubly) truncated data. Two real data sets are included.
3279 Survival Analysis dynamichazard Dynamic Hazard Models using State Space Models Contains functions that lets you fit dynamic hazard models using state space models. The first implemented model is described in Fahrmeir (1992) <doi:10.1080/01621459.1992.10475232> and Fahrmeir (1994) <doi:10.1093/biomet/81.2.317>. Extensions hereof are available where the Extended Kalman filter is replaced by an unscented Kalman filter and other options including particle filters. The implemented particle filters support more general state space models.
3280 Survival Analysis dynfrail Fitting Dynamic Frailty Models with the EM Algorithm Fits semiparametric dynamic frailty models according to the methodology of Putter and van Houwelingen (2015) <doi:10.1093/biostatistics/kxv002>. Intermediate models, where the frailty is piecewise constant on prespecified intervals, are also supported. The frailty process is taken to have a specific auto-correlation structure, and the supported distributions include gamma, inverse Gaussian, power variance family (PVF) and positive stable.
3281 Survival Analysis dynpred Companion Package to “Dynamic Prediction in Clinical Survival Analysis” The dynpred package contains functions for dynamic prediction in survival analysis.
3282 Survival Analysis dynsurv Dynamic Models for Survival Data Functions fitting time-varying coefficient models for interval censored and right censored survival data. Three major approaches are implemented: 1) Bayesian Cox model with time-independent, time-varying or dynamic coefficients for right censored and interval censored data; 2) Spline based time-varying coefficient Cox model for right censored data; 3) Transformation model with time-varying coefficients for right censored data using estimating equations.
3283 Survival Analysis eha (core) Event History Analysis Sampling of risk sets in Cox regression, selections in the Lexis diagram, bootstrapping. Parametric proportional hazards fitting with left truncation and right censoring for common families of distributions, piecewise constant hazards, and discrete models. Parametric accelerated failure time models for left truncated and right censored data.
3284 Survival Analysis ELYP Empirical Likelihood Analysis for the Cox Model and Yang-Prentice (2005) Model Empirical likelihood ratio tests for the Yang and Prentice (short/long term hazards ratio) models. Empirical likelihood tests within a Cox model, for parameters defined via both baseline hazard function and regression parameters.
3285 Survival Analysis emplik Empirical Likelihood Ratio for Censored/Truncated Data Empirical likelihood ratio tests for means/quantiles/hazards from possibly censored and/or truncated data. Now does regression too. This version contains some C code.
3286 Survival Analysis emplik2 Empirical Likelihood Ratio Test for Two Samples with Censored Data Calculates the p-value for a mean-type hypothesis (or multiple mean-type hypotheses) based on two samples with censored data.
3287 Survival Analysis Epi A Package for Statistical Analysis in Epidemiology Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data, in particular representation, manipulation and simulation of multistate data - the Lexis suite of functions, which includes interfaces to ‘mstate’, ‘etm’ and ‘cmprsk’ packages. Also contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
3288 Survival Analysis epiR Tools for the Analysis of Epidemiological Data Tools for the analysis of epidemiological data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, and computing confidence intervals around incidence risk and incidence rate estimates. Miscellaneous functions for use in meta-analysis, diagnostic test interpretation, and sample size calculations.
3289 Survival Analysis etm Empirical Transition Matrix The etm (empirical transition matrix) package permits to estimate the matrix of transition probabilities for any time-inhomogeneous multistate model with finite state space using the Aalen-Johansen estimator. Functions for data preparation and for displaying are also included (Allignol et al., 2011 <doi:10.18637/jss.v038.i04>). Functionals of the Aalen-Johansen estimator, e.g., excess length-of-stay in an intermediate state, can also be computed (Allignol et al. 2011 <doi:10.1007/s00180-010-0200-x>).
3290 Survival Analysis exactRankTests Exact Distributions for Rank and Permutation Tests Computes exact conditional p-values and quantiles using an implementation of the Shift-Algorithm by Streitberg & Roehmel.
3291 Survival Analysis FamEvent Family Age-at-Onset Data Simulation and Penetrance Estimation Simulates age-at-onset traits associated with a segregating major gene in family data obtained from population-based, clinic-based, or multi-stage designs. Appropriate ascertainment correction is utilized to estimate age-dependent penetrance functions either parametrically from the fitted model or nonparametrically from the data. The Expectation and Maximization algorithm can infer missing genotypes and carrier probabilities estimated from family’s genotype and phenotype information or from a fitted model. Plot functions include pedigrees of simulated families and predicted penetrance curves based on specified parameter values.
3292 Survival Analysis fastcox Lasso and Elastic-Net Penalized Cox’s Regression in High Dimensions Models using the Cocktail Algorithm We implement a cocktail algorithm, a good mixture of coordinate decent, the majorization-minimization principle and the strong rule, for computing the solution paths of the elastic net penalized Cox’s proportional hazards model. The package is an implementation of Yang, Y. and Zou, H. (2013) DOI: <doi:10.4310/SII.2013.v6.n2.a1>.
3293 Survival Analysis fastpseudo Fast Pseudo Observations Computes pseudo-observations for survival analysis on right-censored data based on restricted mean survival time.
3294 Survival Analysis FHtest Tests for Right and Interval-Censored Survival Data Based on the Fleming-Harrington Class Functions to compare two or more survival curves with: a) The Fleming-Harrington test for right-censored data based on permutations and on counting processes. b) An extension of the Fleming-Harrington test for interval-censored data based on a permutation distribution and on a score vector distribution.
3295 Survival Analysis fitdistrplus Help to Fit of a Parametric Distribution to Non-Censored or Censored Data Extends the fitdistr() function (of the MASS package) with several functions to help the fit of a parametric distribution to non-censored or censored data. Censored data may contain left censored, right censored and interval censored values, with several lower and upper bounds. In addition to maximum likelihood estimation (MLE), the package provides moment matching (MME), quantile matching (QME) and maximum goodness-of-fit estimation (MGE) methods (available only for non-censored data). Weighted versions of MLE, MME and QME are available. See e.g. Casella & Berger (2002). Statistical inference. Pacific Grove.
3296 Survival Analysis flexPM Flexible Parametric Models for Censored and Truncated Data Estimation of flexible parametric models for survival data.
3297 Survival Analysis flexrsurv Flexible Relative Survival Analysis Package for parametric relative survival analyses. It allows to model non-linear and non-proportional effects using splines (B-spline and truncated power basis). It also includes both non proportional and non linear effects of Remontet, L. et al. (2007) <doi:10.1002/sim.2656> and Mahboubi, A. et al. (2011) <doi:10.1002/sim.4208>.
3298 Survival Analysis flexsurv Flexible Parametric Survival and Multi-State Models Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models.
3299 Survival Analysis frailtyEM Fitting Frailty Models with the EM Algorithm Contains functions for fitting shared frailty models with a semi-parametric baseline hazard with the Expectation-Maximization algorithm. Supported data formats include clustered failures with left truncation and recurrent events in gap-time or Andersen-Gill format. Several frailty distributions, such as the the gamma, positive stable and the Power Variance Family are supported.
3300 Survival Analysis frailtypack General Frailty Models: Shared, Joint and Nested Frailty Models with Prediction; Evaluation of Failure-Time Surrogate Endpoints The following several classes of frailty models using a penalized likelihood estimation on the hazard function but also a parametric estimation can be fit using this R package: 1) A shared frailty model (with gamma or log-normal frailty distribution) and Cox proportional hazard model. Clustered and recurrent survival times can be studied. 2) Additive frailty models for proportional hazard models with two correlated random effects (intercept random effect with random slope). 3) Nested frailty models for hierarchically clustered data (with 2 levels of clustering) by including two iid gamma random effects. 4) Joint frailty models in the context of the joint modelling for recurrent events with terminal event for clustered data or not. A joint frailty model for two semi-competing risks and clustered data is also proposed. 5) Joint general frailty models in the context of the joint modelling for recurrent events with terminal event data with two independent frailty terms. 6) Joint Nested frailty models in the context of the joint modelling for recurrent events with terminal event, for hierarchically clustered data (with two levels of clustering) by including two iid gamma random effects. 7) Multivariate joint frailty models for two types of recurrent events and a terminal event. 8) Joint models for longitudinal data and a terminal event. 9) Trivariate joint models for longitudinal data, recurrent events and a terminal event. 10) Joint frailty models for the validation of surrogate endpoints in multiple randomized clinical trials with failure-time endpoints Prediction values are available (for a terminal event or for a new recurrent event). Left-truncated (not for Joint model), right-censored data, interval-censored data (only for Cox proportional hazard and shared frailty model) and strata are allowed. In each model, the random effects have the gamma or normal distribution. Now, you can also consider time-varying covariates effects in Cox, shared and joint frailty models (1-5). The package includes concordance measures for Cox proportional hazards models and for shared frailty models.
3301 Survival Analysis gamboostMSM Estimating multistate models using gamboost() Provides features to use function gamboost() from package mboost for estimation of multistate models
3302 Survival Analysis gamlss.cens Fitting an Interval Response Variable Using ‘gamlss.family’ Distributions This is an add-on package to GAMLSS. The purpose of this package is to allow users to fit interval response variables in GAMLSS models. The main function gen.cens() generates a censored version of an existing GAMLSS family distribution.
3303 Survival Analysis gbm Generalized Boosted Regression Models An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway.
3304 Survival Analysis gcerisk Generalized Competing Event Model Generalized competing event model based on Cox PH model and Fine-Gray model. This function is designed to develop optimized risk-stratification methods for competing risks data, such as described in: 1. Carmona R, Gulaya S, Murphy JD, Rose BS, Wu J, Noticewala S,McHale MT, Yashar CM, Vaida F, and Mell LK (2014) <doi:10.1016/j.ijrobp.2014.03.047>. 2. Carmona R, Zakeri K, Green G, Hwang L, Gulaya S, Xu B, Verma R, Williamson CW, Triplett DP, Rose BS, Shen H, Vaida F, Murphy JD, and Mell LK (2016) <doi:10.1200/JCO.2015.65.0739>. 3. Lunn, Mary, and Don McNeil (1995) <doi:10.2307/2532940>.
3305 Survival Analysis gems Generalized Multistate Simulation Model Simulate and analyze multistate models with general hazard functions. gems provides functionality for the preparation of hazard functions and parameters, simulation from a general multistate model and predicting future events. The multistate model is not required to be a Markov model and may take the history of previous events into account. In the basic version, it allows to simulate from transition-specific hazard function, whose parameters are multivariable normally distributed.
3306 Survival Analysis genSurv Generating Multi-State Survival Data Generation of survival data with one (binary) time-dependent covariate. Generation of survival data arising from a progressive illness-death model.
3307 Survival Analysis glmnet Lasso and Elastic-Net Regularized Generalized Linear Models Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model. Two recent additions are the multiple-response Gaussian, and the grouped multinomial regression. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the paper linked to via the URL below.
3308 Survival Analysis glmpath L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model A path-following algorithm for L1 regularized generalized linear models and Cox proportional hazards model.
3309 Survival Analysis globalboosttest Testing the additional predictive value of high-dimensional data ‘globalboosttest’ implements a permutation-based testing procedure to globally test the (additional) predictive value of a large set of predictors given that a small set of predictors is already available. Currently, ‘globalboosttest’ supports binary outcomes (via logistic regression) and survival outcomes (via Cox regression). It is based on boosting regression as implemented in the package ‘mboost’.
3310 Survival Analysis glrt Generalized Logrank Tests for Interval-censored Failure Time Data Functions to conduct four generalized logrank tests and a score test under a proportional hazards model
3311 Survival Analysis gof Model-diagnostics based on cumulative residuals Implementation of model-checking techniques for generalized linear models and linear structural equation models based on cumulative residuals
3312 Survival Analysis GORCure Fit Generalized Odds Rate Mixture Cure Model with Interval Censored Data Generalized Odds Rate Mixture Cure (GORMC) model is a flexible model of fitting survival data with a cure fraction, including the Proportional Hazards Mixture Cure (PHMC) model and the Proportional Odds Mixture Cure Model as special cases. This package fit the GORMC model with interval censored data.
3313 Survival Analysis gss General Smoothing Splines A comprehensive package for structural multivariate function estimation using smoothing splines.
3314 Survival Analysis GSSE Genotype-Specific Survival Estimation We propose a fully efficient sieve maximum likelihood method to estimate genotype-specific distribution of time-to-event outcomes under a nonparametric model. We can handle missing genotypes in pedigrees. We estimate the time-dependent hazard ratio between two genetic mutation groups using B-splines, while applying nonparametric maximum likelihood estimation to the reference baseline hazard function. The estimators are calculated via an expectation-maximization algorithm.
3315 Survival Analysis gte Generalized Turnbull’s Estimator Generalized Turnbull’s estimator proposed by Dehghan and Duchesne (2011).
3316 Survival Analysis hdnom Benchmarking and Visualization Toolkit for Penalized Cox Models Creates nomogram visualizations for penalized Cox regression models, with the support of reproducible survival model building, validation, calibration, and comparison for high-dimensional data.
3317 Survival Analysis ICBayes Bayesian Semiparametric Models for Interval-Censored Data Contains functions to fit Bayesian semiparametric regression survival models (proportional hazards model, proportional odds model, and probit model) to interval-censored time-to-event data.
3318 Survival Analysis ICE Iterated Conditional Expectation Kernel Estimators for Interval-Censored Data
3319 Survival Analysis icenReg Regression Models for Interval Censored Data Regression models for interval censored data. Currently supports Cox-PH, proportional odds, and accelerated failure time models. Allows for semi and fully parametric models (parametric only for accelerated failure time models) and Bayesian parametric models. Includes functions for easy visual diagnostics of model fits and imputation of censored data.
3320 Survival Analysis ICGOR Fit Generalized Odds Rate Hazards Model with Interval Censored Data Generalized Odds Rate Hazards (GORH) model is a flexible model of fitting survival data, including the Proportional Hazards (PH) model and the Proportional Odds (PO) Model as special cases. This package fit the GORH model with interval censored data.
3321 Survival Analysis icRSF A Modified Random Survival Forest Algorithm Implements a modification to the Random Survival Forests algorithm for obtaining variable importance in high dimensional datasets. The proposed algorithm is appropriate for settings in which a silent event is observed through sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The modified algorithm incorporates a formal likelihood framework that accommodates sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The original Random Survival Forests algorithm is modified by the introduction of a new splitting criterion based on a likelihood ratio test statistic.
3322 Survival Analysis ICsurv A package for semiparametric regression analysis of interval-censored data Currently using the proportional hazards (PH) model. More methods under other semiparametric regression models will be included in later versions.
3323 Survival Analysis IDPSurvival Imprecise Dirichlet Process for Survival Analysis Functions to perform robust nonparametric survival analysis with right censored data using a prior near-ignorant Dirichlet Process. Mangili, F., Benavoli, A., de Campos, C.P., Zaffalon, M. (2015) <doi:10.1002/bimj.201500062>.
3324 Survival Analysis imputeYn Imputing the Last Largest Censored Observation(s) Under Weighted Least Squares Method brings less bias and more efficient estimates for AFT models.
3325 Survival Analysis InformativeCensoring Multiple Imputation for Informative Censoring Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation from Jackson et al. (2014) <doi:10.1002/sim.6274> and Risk Score Imputation from Hsu et al. (2009) <doi:10.1002/sim.3480>.
3326 Survival Analysis intccr Semiparametric Competing Risks Regression under Interval Censoring Semiparametric regression models on the cumulative incidence function with interval-censored competing risks data as described in Bakoyannis, Yu, & Yiannoutsos (2017) <doi:10.1002/sim.7350>. The main function fits the proportional subdistribution hazards model (Fine-Gray model), the proportional odds model, and other models that belong to the class of semiparametric generalized odds rate transformation models.
3327 Survival Analysis intercure Cure Rate Estimators for Interval Censored Data Implementations of semiparametric cure rate estimators for interval censored data in R. The algorithms are based on the promotion time and frailty models, all for interval censoring. For the frailty model, there is also a implementation contemplating clustered data.
3328 Survival Analysis interval Weighted Logrank Tests and NPMLE for interval censored data Functions to fit nonparametric survival curves, plot them, and perform logrank or Wilcoxon type tests.
3329 Survival Analysis invGauss Threshold regression that fits the (randomized drift) inverse Gaussian distribution to survival data invGauss fits the (randomized drift) inverse Gaussian distribution to survival data. The model is described in Aalen OO, Borgan O, Gjessing HK. Survival and Event History Analysis. A Process Point of View. Springer, 2008. It is based on describing time to event as the barrier hitting time of a Wiener process, where drift towards the barrier has been randomized with a Gaussian distribution. The model allows covariates to influence starting values of the Wiener process and/or average drift towards a barrier, with a user-defined choice of link functions.
3330 Survival Analysis ipdmeta Tools for subgroup analyses with multiple trial data using aggregate statistics This package provides functions to estimate an IPD linear mixed effects model for a continuous outcome and any categorical covariate from study summary statistics. There are also functions for estimating the power of a treatment-covariate interaction test in an individual patient data meta-analysis from aggregate data.
3331 Survival Analysis ipred Improved Predictors Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
3332 Survival Analysis isoph Isotonic Proportional Hazards Model Nonparametric estimation of an isotonic covariate effect for proportional hazards model.
3333 Survival Analysis jackknifeKME Jackknife Estimates of Kaplan-Meier Estimators or Integrals Computing the original and modified jackknife estimates of Kaplan-Meier estimators.
3334 Survival Analysis JM Joint Modeling of Longitudinal and Survival Data Shared parameter models for the joint modeling of longitudinal and time-to-event data.
3335 Survival Analysis JMbayes Joint Modeling of Longitudinal and Time-to-Event Data under a Bayesian Approach Shared parameter models for the joint modeling of longitudinal and time-to-event data using MCMC; Dimitris Rizopoulos (2016) <doi:10.18637/jss.v072.i07>.
3336 Survival Analysis joineR Joint Modelling of Repeated Measurements and Time-to-Event Data Analysis of repeated measurements and time-to-event data via random effects joint models. Fits the joint models proposed by Henderson and colleagues <doi:10.1093/biostatistics/1.4.465> (single event time) and by Williamson and colleagues (2008) <doi:10.1002/sim.3451> (competing risks events time) to a single continuous repeated measure. The time-to-event data is modelled using a (cause-specific) Cox proportional hazards regression model with time-varying covariates. The longitudinal outcome is modelled using a linear mixed effects model. The association is captured by a latent Gaussian process. The model is estimated using am Expectation Maximization algorithm. Some plotting functions and the variogram are also included. This project is funded by the Medical Research Council (Grant numbers G0400615 and MR/M013227/1).
3337 Survival Analysis joineRML Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project is funded by the Medical Research Council (Grant number MR/M013227/1).
3338 Survival Analysis joint.Cox Joint Frailty-Copula Models for Tumour Progression and Death in Meta-Analysis Perform likelihood estimation and dynamic prediction under joint frailty-copula models for tumour progression and death in meta-analysis. A penalized likelihood is employed for estimating model parameters, where the baseline hazard functions are approximated by smoothing splines. The methods are applicable for meta-analytic data combining several studies. The methods can analyze data having information on both terminal event time (e.g., time-to-death) and non-terminal event time (e.g., time-to-tumour progression). See Emura et al. (2017) <doi:10.1177/0962280215604510> for likelihood estimation, and Emura et al. (2018) <doi:10.1177/0962280216688032> for dynamic prediction. Survival data from ovarian cancer patients are also available.
3339 Survival Analysis JointModel Semiparametric Joint Models for Longitudinal and Counting Processes Joint fit of a semiparametric regression model for longitudinal responses and a semiparametric transformation model for time-to-event data.
3340 Survival Analysis kaps K-Adaptive Partitioning for Survival data This package provides some routines to conduct the K-adaptive parititioning (kaps) algorithm for survival data. A function kaps is an implementation version of our algorithm.
3341 Survival Analysis km.ci Confidence intervals for the Kaplan-Meier estimator Computes various confidence intervals for the Kaplan-Meier estimator, namely: Petos CI, Rothman CI, CI’s based on Greenwoods variance, Thomas and Grunkemeier CI and the simultaneous confidence bands by Nair and Hall and Wellner.
3342 Survival Analysis kmconfband Kaplan-Meier Simultaneous Confidence Band for the Survivor Function Computes and plots an exact nonparametric band for any user-specified level of confidence from a single-sample survfit object
3343 Survival Analysis kmi Kaplan-Meier Multiple Imputation for the Analysis of Cumulative Incidence Functions in the Competing Risks Setting Performs a Kaplan-Meier multiple imputation to recover the missing potential censoring information from competing risks events, so that standard right-censored methods could be applied to the imputed data sets to perform analyses of the cumulative incidence functions (Allignol and Beyersmann, 2010 <doi:10.1093/biostatistics/kxq018>).
3344 Survival Analysis KMsurv Data sets from Klein and Moeschberger (1997), Survival Analysis Data sets and functions for Klein and Moeschberger (1997), “Survival Analysis, Techniques for Censored and Truncated Data”, Springer.
3345 Survival Analysis landest Landmark Estimation of Survival and Treatment Effect Provides functions to estimate survival and a treatment effect using a landmark estimation approach.
3346 Survival Analysis lava.tobit Latent Variable Models with Censored and Binary Outcomes Lava plugin allowing combinations of left and right censored and binary outcomes.
3347 Survival Analysis lbiassurv Length-biased correction to survival curve estimation The package offers various length-bias corrections to survival curve estimation.
3348 Survival Analysis LearnBayes Functions for Learning Bayesian Inference A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.
3349 Survival Analysis LexisPlotR Plot Lexis Diagrams for Demographic Purposes Functions to plot Lexis Diagrams for Demographic purposes.
3350 Survival Analysis lmec Linear Mixed-Effects Models with Censored Responses This package includes a function to fit a linear mixed-effects model in the formulation described in Laird and Ware (1982) but allowing for censored normal responses. In this version, the with-in group errors are assumed independent and identically distributed.
3351 Survival Analysis locfit Local Regression, Likelihood and Density Estimation Local regression, likelihood and density estimation.
3352 Survival Analysis logconcens Maximum likelihood estimation of a log-concave density based on censored data Based on right or interval censored data, compute the maximum likelihood estimator of a (sub)probability density under the assumption that it is log-concave. For further information see Duembgen, Rufibach, and Schuhmacher (2011, preprint).
3353 Survival Analysis LogicReg Logic Regression Routines for fitting Logic Regression models.
3354 Survival Analysis LogrankA Logrank Test for Aggregated Survival Data LogrankA provides a logrank test across unlimited groups with the possibility to input aggregated survival data.
3355 Survival Analysis logspline Routines for Logspline Density Estimation Contains routines for logspline density estimation. The function oldlogspline() uses the same algorithm as the logspline package version 1.0.x; i.e. the Kooperberg and Stone (1992) algorithm (with an improved interface). The recommended routine logspline() uses an algorithm from Stone et al (1997) <doi:10.1214/aos/1031594728>.
3356 Survival Analysis lpc Lassoed Principal Components for Testing Significance of Features Implements the LPC method of Witten&Tibshirani(Annals of Applied Statistics 2008) for identification of significant genes in a microarray experiment.
3357 Survival Analysis lsmeans Least-Squares Means Obtain least-squares means for linear, generalized linear, and mixed models. Compute contrasts or linear functions of least-squares means, and comparisons of slopes. Plots and compact letter displays. Least-squares means were proposed in Harvey, W (1960) “Least-squares analysis of data with unequal subclass numbers”, Tech Report ARS-20-8, USDA National Agricultural Library, and discussed further in Searle, Speed, and Milliken (1980) “Population marginal means in the linear model: An alternative to least squares means”, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>. NOTE: lsmeans now relies primarily on code in the ‘emmeans’ package. ‘lsmeans’ will be archived in the near future.
3358 Survival Analysis LTRCtrees Survival Trees to Fit Left-Truncated and Right-Censored and Interval-Censored Survival Data Recursive partition algorithms designed for fitting survival tree with left-truncated and right censored (LTRC) data, as well as interval-censored data. The LTRC trees can also be used to fit survival tree with time-varying covariates.
3359 Survival Analysis MAMSE Calculation of Minimum Averaged Mean Squared Error (MAMSE) Weights Calculates the nonparametric adaptive MAMSE weights for univariate, right-censored or multivariate data. The MAMSE weights can be used in a weighted likelihood or to define a mixture of empirical distribution functions. The package includes functions for the MAMSE weighted Kaplan-Meier estimate and for MAMSE weighted ROC curves.
3360 Survival Analysis maxstat Maximally Selected Rank Statistics Maximally selected rank statistics with several p-value approximations.
3361 Survival Analysis mboost Model-Based Boosting Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
3362 Survival Analysis MCMCglmm MCMC Generalised Linear Mixed Models MCMC Generalised Linear Mixed Models.
3363 Survival Analysis MCMCpack Markov Chain Monte Carlo (MCMC) Package Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return ‘coda’ mcmc objects that can then be summarized using the ‘coda’ package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
3364 Survival Analysis mets Analysis of Multivariate Event Times Implementation of various statistical models for multivariate event history data <doi:10.1007/s10985-013-9244-x>. Including multivariate cumulative incidence models <doi:10.1002/sim.6016>, and bivariate random effects probit models (Liability models) <doi:10.1016/j.csda.2015.01.014>. Also contains two-stage binomial modelling that can do pairwise odds-ratio dependence modelling based marginal logistic regression models. This is an alternative to the alternating logistic regression approach (ALR).
3365 Survival Analysis mexhaz Mixed Effect Excess Hazard Models Fit flexible (excess) hazard regression models with the possibility of including non-proportional effects of covariables and of adding a random effect at the cluster level (corresponding to a shared frailty).
3366 Survival Analysis mfp Multivariable Fractional Polynomials Fractional polynomials are used to represent curvature in regression models. A key reference is Royston and Altman, 1994.
3367 Survival Analysis miCoPTCM Promotion Time Cure Model with Mis-Measured Covariates Fits Semiparametric Promotion Time Cure Models, taking into account (using a corrected score approach or the SIMEX algorithm) or not the measurement error in the covariates, using a backfitting approach to maximize the likelihood.
3368 Survival Analysis MicSim Performing Continuous-Time Microsimulation This entry-level toolkit allows performing continuous-time microsimulation for a wide range of demographic applications. Individual life-courses are specified by a continuous-time multi-state model.
3369 Survival Analysis MIICD Multiple Imputation for Interval Censored Data Implements multiple imputation for proportional hazards regression with interval censored data or proportional sub-distribution hazards regression for interval censored competing risks data. The main functions allow to estimate survival function, cumulative incidence function, Cox and Fine & Gray regression coefficients and associated variance-covariance matrix. ‘MIICD’ functions call ‘Surv’, ‘survfit’ and ‘coxph’ from the ‘survival’ package, ‘crprep’ from the ‘mstate’ package, and ‘mvrnorm’ from the ‘MASS’ package.
3370 Survival Analysis mixAK Multivariate Normal Mixture Models and Mixtures of Generalized Linear Mixed Models Including Model Based Clustering Contains a mixture of statistical methods including the MCMC methods to analyze normal mixtures. Additionally, model based clustering methods are implemented to perform classification based on (multivariate) longitudinal (or otherwise correlated) data. The basis for such clustering is a mixture of multivariate generalized linear mixed models.
3371 Survival Analysis mixPHM Mixtures of Proportional Hazard Models Fits multiple variable mixtures of various parametric proportional hazard models using the EM-Algorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing values and censored values are allowed. Independence is assumed over the single variables.
3372 Survival Analysis MLEcens Computation of the MLE for bivariate (interval) censored data This package contains functions to compute the nonparametric maximum likelihood estimator (MLE) for the bivariate distribution of (X,Y), when realizations of (X,Y) cannot be observed directly. To be more precise, we consider the situation where we observe a set of rectangles that are known to contain the unobservable realizations of (X,Y). We compute the MLE based on such a set of rectangles. The methods can also be used for univariate censored data (see data set ‘cosmesis’), and for censored data with competing risks (see data set ‘menopause’). We also provide functions to visualize the observed data and the MLE.
3373 Survival Analysis MRsurv A multiplicative-regression model for relative survival This package contains functions, data and examples to compute a multiplicative-regression model for relative survival.
3374 Survival Analysis msm Multi-State Markov and Hidden Markov Models in Continuous Time Functions for fitting continuous-time Markov and hidden Markov multi-state models to longitudinal data. Designed for processes observed at arbitrary times in continuous time (panel data) but some other observation schemes are supported. Both Markov transition rates and the hidden Markov output process can be modelled in terms of covariates, which may be constant or piecewise-constant in time.
3375 Survival Analysis msmtools Building Augmented Data to Run Multi-State Models with ‘msm’ Package A fast and general method for restructuring classical longitudinal data into augmented ones. The reason for this is to facilitate the modeling of longitudinal data under a multi-state framework using the ‘msm’ package.
3376 Survival Analysis msSurv Nonparametric Estimation for Multistate Models Nonparametric estimation for right censored, left truncated time to event data in multistate models.
3377 Survival Analysis MST Multivariate Survival Trees Constructs trees for multivariate survival data using marginal and frailty models. Grows, prunes, and selects the best-sized tree.
3378 Survival Analysis mstate (core) Data Preparation, Estimation and Prediction in Multi-State Models Contains functions for data preparation, descriptives, hazard estimation and prediction with Aalen-Johansen or simulation in competing risks and multi-state models, see Putter, Fiocco, Geskus (2007) <doi:10.1002/sim.2712>.
3379 Survival Analysis muhaz (core) Hazard Function Estimation in Survival Analysis Produces a smooth estimate of the hazard function for censored data.
3380 Survival Analysis multcomp Simultaneous Inference in General Parametric Models Simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects, and survival models. The package includes demos reproducing analyzes presented in the book “Multiple Comparisons Using R” (Bretz, Hothorn, Westfall, 2010, CRC Press).
3381 Survival Analysis multipleNCC Weighted Cox-Regression for Nested Case-Control Data Fit Cox proportional hazard models with a weighted partial likelihood. It handles one or multiple endpoints, additional matching and makes it possible to reuse controls for other endpoints.
3382 Survival Analysis mvna Nelson-Aalen Estimator of the Cumulative Hazard in Multistate Models Computes the Nelson-Aalen estimator of the cumulative transition hazard for arbitrary Markov multistate models <ISBN:978-0-387-68560-1>.
3383 Survival Analysis NADA Nondetects and Data Analysis for Environmental Data Contains methods described by Dennis Helsel in his book “Nondetects And Data Analysis: Statistics for Censored Environmental Data”.
3384 Survival Analysis NestedCohort Survival Analysis for Cohorts with Missing Covariate Information Estimate hazard ratios, survival curves and attributable risks for cohorts with missing covariates, using Cox models or Kaplan-Meier estimated for strata. This handles studies nested within cohorts, such as case-cohort studies with stratified sampling. See http://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf
3385 Survival Analysis NPHMC Sample Size Calculation for the Proportional Hazards Mixture Cure Model An R-package for calculating sample size of a survival trial with or without cure fractions
3386 Survival Analysis NPMLEcmprsk Type-Specific Failure Rate and Hazard Rate on Competing Risks Data Given a failure type, the function computes covariate-specific probability of failure over time and covariate-specific conditional hazard rate based on possibly right-censored competing risk data. Specifically, it computes the non-parametric maximum-likelihood estimates of these quantities and their asymptotic variances in a semi-parametric mixture model for competing-risks data, as described in Chang et al. (2007a).
3387 Survival Analysis npsurv Nonparametric Survival Analysis Contains functions for non-parametric survival analysis of exact and interval-censored observations.
3388 Survival Analysis OrdFacReg Least Squares, Logistic, and Cox-Regression with Ordered Predictors In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of a precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. This package implements an active set algorithm that efficiently computes such estimators.
3389 Survival Analysis OutlierDC Outlier Detection using quantile regression for Censored Data This package provides three algorithms to detect outlying observations for censored survival data.
3390 Survival Analysis p3state.msm Analyzing survival data Analyzing survival data from illness-death model
3391 Survival Analysis paf Attributable Fraction Function for Censored Survival Data Calculate unadjusted/adjusted attributable fraction function of a set of covariates for a censored survival outcome from a Cox model using the method proposed by Chen, Lin and Zeng (Biometrika 97, 713-726., 2010).
3392 Survival Analysis pamr Pam: Prediction Analysis for Microarrays Some functions for sample classification in microarrays.
3393 Survival Analysis parfm Parametric Frailty Models Fits Parametric Frailty Models by maximum marginal likelihood. Possible baseline hazards: exponential, Weibull, inverse Weibull (Frechet), Gompertz, lognormal, log-skew-normal, and loglogistic. Possible Frailty distributions: gamma, positive stable, inverse Gaussian and lognormal.
3394 Survival Analysis parfm Parametric Frailty Models Fits Parametric Frailty Models by maximum marginal likelihood. Possible baseline hazards: exponential, Weibull, inverse Weibull (Frechet), Gompertz, lognormal, log-skew-normal, and loglogistic. Possible Frailty distributions: gamma, positive stable, inverse Gaussian and lognormal.
3395 Survival Analysis party A Laboratory for Recursive Partytioning A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman’s random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/1471-2105-8-25>.
3396 Survival Analysis pch Piecewise Constant Hazards Models for Censored and Truncated Data Using piecewise constant hazards models is a very flexible approach for the analysis of survival data. The time line is divided into sub-intervals; for each interval, a different hazard is estimated using Poisson regression.
3397 Survival Analysis pec Prediction Error Curves for Risk Prediction Models in Survival Analysis Validation of risk predictions obtained from survival models and competing risk models based on censored data using inverse weighting and cross-validation.
3398 Survival Analysis penalized L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Cross-validation routines allow optimization of the tuning parameters.
3399 Survival Analysis PenCoxFrail Regularization in Cox Frailty Models A regularization approach for Cox Frailty Models by penalization methods is provided.
3400 Survival Analysis penMSM Estimating Regularized Multi-state Models Using L1 Penalties Structured fusion Lasso penalized estimation of multi-state models with the penalty applied to absolute effects and absolute effect differences (i.e., effects on transition-type specific hazard rates).
3401 Survival Analysis peperr Parallelised Estimation of Prediction Error Designed for prediction error estimation through resampling techniques, possibly accelerated by parallel execution on a compute cluster. Newly developed model fitting routines can be easily incorporated.
3402 Survival Analysis PermAlgo Permutational Algorithm to Simulate Survival Data This version of the permutational algorithm generates a dataset in which event and censoring times are conditional on an user-specified list of covariates, some or all of which are time-dependent.
3403 Survival Analysis PHeval Evaluation of the Proportional Hazards Assumption with a Standardized Score Process Provides tools for the evaluation of the goodness of fit and the predictive capacity of the proportional hazards model.
3404 Survival Analysis plac A Pairwise Likelihood Augmented Cox Estimator for Left-Truncated Data A semi-parametric estimation method for the Cox model with left-truncated data using augmented information from the marginal of truncation times.
3405 Survival Analysis polspline Polynomial Spline Routines Routines for the polynomial spline fitting routines hazard regression, hazard estimation with flexible tails, logspline, lspec, polyclass, and polymars, by C. Kooperberg and co-authors.
3406 Survival Analysis popEpi Functions for Epidemiological Analysis using Population Data Enables computation of epidemiological statistics, including those where counts or mortality rates of the reference population are used. Currently supported: excess hazard models, rates, mean survival times, relative survival, and standardized incidence and mortality ratios (SIRs/SMRs), all of which can be easily adjusted for by covariates such as age. Fast splitting and aggregation of ‘Lexis’ objects (from package ‘Epi’) and other computations achieved using ‘data.table’.
3407 Survival Analysis powerSurvEpi Power and Sample Size Calculation for Survival Analysis of Epidemiological Studies Functions to calculate power and sample size for testing main effect or interaction effect in the survival analysis of epidemiological studies (non-randomized studies), taking into account the correlation between the covariate of the interest and other covariates. Some calculations also take into account the competing risks and stratified analysis. This package also includes a set of functions to calculate power and sample size for testing main effect in the survival analysis of randomized clinical trials.
3408 Survival Analysis PReMiuM Dirichlet Process Bayesian Clustering, Profile Regression Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal, survival and categorical response, as well as Normal and discrete covariates. It also allows for fixed effects in the response model, where a spatial CAR (conditional autoregressive) term can be also included. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. The main reference for the package is Liverani, Hastie, Azizi, Papathomas and Richardson (2015) <doi:10.18637/jss.v064.i07>.
3409 Survival Analysis prodlim Product-Limit Estimation for Censored Event History Analysis Fast and user friendly implementation of nonparametric estimators for censored event history (survival) analysis. Kaplan-Meier and Aalen-Johansen method.
3410 Survival Analysis psbcGroup Penalized Parametric and Semiparametric Bayesian Survival Models with Shrinkage and Grouping Priors Algorithms for fitting penalized parametric and semiparametric Bayesian survival models with shrinkage and grouping priors.
3411 Survival Analysis pseudo Computes Pseudo-Observations for Modeling Various functions for computing pseudo-observations for censored data regression. Computes pseudo-observations for modeling: competing risks based on the cumulative incidence function, survival function based on the restricted mean, survival function based on the Kaplan-Meier estimator see Klein et al. (2008) <doi:10.1016/j.cmpb.2007.11.017>.
3412 Survival Analysis quantreg Quantile Regression Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also included.
3413 Survival Analysis randomForestSRC Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) Fast OpenMP parallel computing of Breiman’s random forests for survival, competing risks, regression and classification based on Ishwaran and Kogalur’s popular random survival forests (RSF) package. Handles missing data and now includes multivariate, unsupervised forests, quantile regression and solutions for class imbalanced data. New fast interface using subsampling and confidence regions for variable importance.
3414 Survival Analysis ranger A Fast Implementation of Random Forests A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class ‘gwaa.data’ (R package ‘GenABEL’) and ‘dgCMatrix’ (R package ‘Matrix’) can be directly analyzed.
3415 Survival Analysis rankhazard Rank-Hazard Plots Rank-hazard plots Karvanen and Harrell (2009) <doi:10.1002/sim.3591> visualize the relative importance of covariates in a proportional hazards model. The key idea is to rank the covariate values and plot the relative hazard as a function of ranks scaled to interval [0,1]. The relative hazard is plotted in respect to the reference hazard, which can bee.g. the hazard related to the median of the covariate.
3416 Survival Analysis reda Recurrent Event Data Analysis Functions for (1) simulating survival and recurrent event data from stochastic process point of view, (2) exploring and modeling recurrent event data through the mean cumulative function (MCF) or also called the Nelson-Aalen estimator of the cumulative hazard rate function, and gamma frailty model with spline rate function, and (3) comparing two-sample recurrent event responses with the pseudo-score tests.
3417 Survival Analysis relsurv Relative Survival Contains functions for analysing relative survival data, including nonparametric estimators of net (marginal relative) survival, relative survival ratio, crude mortality, methods for fitting and checking additive and multiplicative regression models, transformation approach, methods for dealing with population mortality tables.
3418 Survival Analysis rhosp Side Effect Risks in Hospital : Simulation and Estimation Evaluating risk (that a patient arises a side effect) during hospitalization is the main purpose of this package. Several methods (Parametric, non parametric and De Vielder estimation) to estimate the risk constant (R) are implemented in this package. There are also functions to simulate the different models of this issue in order to quantify the previous estimators. It is necessary to read at least the first six pages of the report to understand the topic.
3419 Survival Analysis riskRegression Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.
3420 Survival Analysis risksetROC Riskset ROC curve estimation from censored survival data Compute time-dependent Incident/dynamic accuracy measures (ROC curve, AUC, integrated AUC )from censored survival data under proportional or non-proportional hazard assumption of Heagerty & Zheng (Biometrics, Vol 61 No 1, 2005, PP 92-105).
3421 Survival Analysis rms (core) Regression Modeling Strategies Regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. ‘rms’ is a collection of functions that assist with and streamline modeling. It also contains functions for binary and ordinal logistic regression models, ordinal models for continuous Y with a variety of distribution families, and the Buckley-James multiple regression model for right-censored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. ‘rms’ works with almost any regression model, but it was especially written to work with binary or ordinal regression models, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.
3422 Survival Analysis RobustAFT Truncated Maximum Likelihood Fit and Robust Accelerated Failure Time Regression for Gaussian and Log-Weibull Case R functions for the computation of the truncated maximum likelihood and the robust accelerated failure time regression for gaussian and log-Weibull case.
3423 Survival Analysis ROCt Time-Dependent ROC Curve Estimators and Expected Utility Functions Contains functions in order to estimate diagnostic and prognostic capacities of continuous markers. More precisely, one function concerns the estimation of time-dependent ROC (ROCt) curve, as proposed by Heagerty et al. (2000) <doi:10.1111/j.0006-341X.2000.00337.x>. One function concerns the adaptation of the ROCt theory for studying the capacity of a marker to predict the excess of mortality of a specific population compared to the general population. This last part is based on additive relative survival models and the work of Pohar-Perme et al. (2012) <doi:10.1111/j.1541-0420.2011.01640.x>. We also propose two functions for cut-off estimation in medical decision making by maximizing time-dependent expected utility function. Finally, we propose confounder-adjusted estimators of ROC and ROCt curves by using the Inverse Probability Weighting (IPW) approach. For the confounder-adjusted ROC curve (without censoring), we also proposed the implementation of the estimator based on placement values proposed by Pepe and Cai (2004) <doi:10.1111/j.0006-341X.2004.00200.x>.
3424 Survival Analysis rpart Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
3425 Survival Analysis rstpm2 Generalized Survival Models R implementation of generalized survival models (GSMs) and smooth accelerated failure time (AFT) models. For the GSMs, g(S(t|x))=eta(t,x) for a link function g, survival S at time t with covariates x and a linear predictor eta(t,x). The main assumption is that the time effect(s) are smooth. For fully parametric models with natural splines, this re-implements Stata’s ‘stpm2’ function, which are flexible parametric survival models developed by Royston and colleagues. We have extended the parametric models to include any smooth parametric smoothers for time. We have also extended the model to include any smooth penalized smoothers from the ‘mgcv’ package, using penalized likelihood. These models include left truncation, right censoring, interval censoring, gamma frailties and normal random effects. For the smooth AFTs, S(t|x) = S_0(t*eta(t,x)), where the baseline survival function S_0(t)=exp(-exp(eta_0(t))) is modelled for natural splines for eta_0, and the time-dependent cumulative acceleration factor eta(t,x)=_0^t exp(eta_1(u,x)) du for log acceleration factor eta_1(u,x).
3426 Survival Analysis saws Small-Sample Adjustments for Wald tests Using Sandwich Estimators Tests coefficients with sandwich estimator of variance and with small samples. Regression types supported are gee, linear regression, and conditional logistic regression.
3427 Survival Analysis SemiCompRisks Hierarchical Models for Parametric and Semi-Parametric Analyses of Semi-Competing Risks Data Hierarchical multistate models are considered to perform the analysis of independent/clustered semi-competing risks data. The package allows to choose the specification for model components from a range of options giving users substantial flexibility, including: accelerated failure time or proportional hazards regression models; parametric or non-parametric specifications for baseline survival functions and cluster-specific random effects distribution; a Markov or semi-Markov specification for terminal event following non-terminal event. While estimation is mainly performed within the Bayesian paradigm, the package also provides the maximum likelihood estimation approach for several parametric models. The package also includes functions for univariate survival analysis as complementary analysis tools.
3428 Survival Analysis SemiMarkov Multi-States Semi-Markov Models Functions for fitting multi-state semi-Markov models to longitudinal data. A parametric maximum likelihood estimation method adapted to deal with Exponential, Weibull and Exponentiated Weibull distributions is considered. Right-censoring can be taken into account and both constant and time-varying covariates can be included using a Cox proportional model. Reference: A. Krol and P. Saint-Pierre (2015) <doi:10.18637/jss.v066.i06>.
3429 Survival Analysis simexaft simexaft Implement of the Simulation-Extrapolation (SIMEX) algorithm for the accelerated failure time (AFT) with covariates subject to measurement error.
3430 Survival Analysis SimHaz Simulated Survival and Hazard Analysis for Time-Dependent Exposure Generate power for the Cox proportional hazards model by simulating survival events data with time dependent exposure status for subjects. A dichotomous exposure variable is considered with a single transition from unexposed to exposed status during the subject’s time on study.
3431 Survival Analysis simMSM Simulation of Event Histories for Multi-State Models Simulation of event histories with possibly non-linear baseline hazard rate functions, non-linear (time-varying) covariate effect functions, and dependencies on the past of the history. Random generation of event histories is performed using inversion sampling on the cumulative all-cause hazard rate functions.
3432 Survival Analysis simPH Tools for Simulating and Plotting Quantities of Interest Estimated from Cox Proportional Hazards Models Simulates and plots quantities of interest (relative hazards, first differences, and hazard ratios) for linear coefficients, multiplicative interactions, polynomials, penalised splines, and non-proportional hazards, as well as stratified survival curves from Cox Proportional Hazard models. It also simulates and plots marginal effects for multiplicative interactions.
3433 Survival Analysis SimSCRPiecewise ‘Simulates Univariate and Semi-Competing Risks Data Given Covariates and Piecewise Exponential Baseline Hazards’ Contains two functions for simulating survival data from piecewise exponential hazards with a proportional hazards adjustment for covariates. The first function SimUNIVPiecewise simulates univariate survival data based on a piecewise exponential hazard, covariate matrix and true regression vector. The second function SimSCRPiecewise semi-competing risks data based on three piecewise exponential hazards, three true regression vectors and three matrices of patient covariates (which can be different or the same). This simulates from the Semi-Markov model of Lee et al (2015) given patient covariates, regression parameters, patient frailties and baseline hazard functions.
3434 Survival Analysis simsurv Simulate Survival Data Simulate survival times from standard parametric survival distributions (exponential, Weibull, Gompertz), 2-component mixture distributions, or a user-defined hazard, log hazard, cumulative hazard, or log cumulative hazard function. Baseline covariates can be included under a proportional hazards assumption. Time dependent effects (i.e. non-proportional hazards) can be included by interacting covariates with linear time or a user-defined function of time. Clustered event times are also accommodated. The 2-component mixture distributions can allow for a variety of flexible baseline hazard functions reflecting those seen in practice. If the user wishes to provide a user-defined hazard or log hazard function then this is possible, and the resulting cumulative hazard function does not need to have a closed-form solution. Note that this package is modelled on the ‘survsim’ package available in the ‘Stata’ software (see Crowther and Lambert (2012) <http://www.stata-journal.com/sjpdf.html?articlenum=st0275> or Crowther and Lambert (2013) <doi:10.1002/sim.5823>).
3435 Survival Analysis smcure Fit Semiparametric Mixture Cure Models An R-package for Estimating Semiparametric PH and AFT Mixture Cure Models
3436 Survival Analysis SmoothHazard Estimation of Smooth Hazard Models for Interval-Censored Data with Applications to Survival and Illness-Death Models Estimation of two-state (survival) models and irreversible illness- death models with possibly interval-censored,left-truncated and right-censored data. Proportional intensities regression models can be specified to allow for covariates effects separately for each transition. We use either a parametric approach with Weibull baseline intensities or a semi-parametric approach with M-splines approximation of baseline intensities in order to obtain smooth estimates of the hazard functions. Parameter estimates are obtained by maximum likelihood in the parametric approach and by penalized maximum likelihood in the semi-parametric approach.
3437 Survival Analysis smoothHR Smooth Hazard Ratio Curves Taking a Reference Value Provides flexible hazard ratio curves allowing non-linear relationships between continuous predictors and survival. To better understand the effects that each continuous covariate has on the outcome, results are ex pressed in terms of hazard ratio curves, taking a specific covariate value as reference. Confidence bands for these curves are also derived.
3438 Survival Analysis smoothSurv Survival Regression with Smoothed Error Distribution Contains, as a main contribution, a function to fit a regression model with possibly right, left or interval censored observations and with the error distribution expressed as a mixture of G-splines. Core part of the computation is done in compiled C++ written using the Scythe Statistical Library Version 0.3.
3439 Survival Analysis SMPracticals Practicals for Use with Davison (2003) Statistical Models Contains the datasets and a few functions for use with the practicals outlined in Appendix A of the book Statistical Models (Davison, 2003, Cambridge University Press). The practicals themselves can be found at <http://statwww.epfl.ch/davison/SM/>.
3440 Survival Analysis spatstat Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests Comprehensive open-source toolbox for analysing Spatial Point Patterns. Focused mainly on two-dimensional point patterns, including multitype/marked points, in any spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, point patterns on a linear network, and patterns of other geometrical objects. Supports spatial covariate data such as pixel images. Contains over 2000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the functions ppm(), kppm(), slrm(), dppm() similar to glm(). Types of models include Poisson, Gibbs and Cox point processes, Neyman-Scott cluster processes, and determinantal point processes. Models may involve dependence on covariates, inter-point interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. A model can be fitted to a list of point patterns (replicated point pattern data) using the function mppm(). The model can include random effects and fixed effects depending on the experimental design, in addition to all the features listed above. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.
3441 Survival Analysis spatsurv Bayesian Spatial Survival Analysis with Parametric Proportional Hazards Models Bayesian inference for parametric proportional hazards spatial survival models; flexible spatial survival models.
3442 Survival Analysis spBayesSurv Bayesian Modeling and Analysis of Spatially Correlated Survival Data Provides several Bayesian survival models for spatial/non-spatial survival data: proportional hazards (PH), accelerated failure time (AFT), proportional odds (PO), and accelerated hazards (AH), a super model that includes PH, AFT, PO and AH as special cases, Bayesian nonparametric nonproportional hazards (LDDPM), generalized accelerated failure time (GAFT), and spatially smoothed Polya tree density estimation. The spatial dependence is modeled via frailties under PH, AFT, PO, AH and GAFT, and via copulas under LDDPM and PH. Model choice is carried out via the logarithm of the pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC).
3443 Survival Analysis SSRMST Sample Size Calculation using Restricted Mean Survival Time Calculates the power and sample size based on the difference in Restricted Mean Survival Time.
3444 Survival Analysis superpc Supervised principal components Supervised principal components for regression and survival analsysis. Especially useful for high-dimnesional data, including microarray data.
3445 Survival Analysis surv2sampleComp Inference for Model-Free Between-Group Parameters for Censored Survival Data Performs inference of several model-free group contrast measures, which include difference/ratio of cumulative incidence rates at given time points, quantiles, and restricted mean survival times (RMST). Two kinds of covariate adjustment procedures (i.e., regression and augmentation) for inference of the metrics based on RMST are also included.
3446 Survival Analysis survAUC Estimators of prediction accuracy for time-to-event data The package provides a variety of functions to estimate time-dependent true/false positive rates and AUC curves from a set of censored survival data.
3447 Survival Analysis survC1 C-statistics for risk prediction models with censored survival data Performs inference for C of risk prediction models with censored survival data, using the method proposed by Uno et al. (2011). Inference for the difference in C between two competing prediction models is also implemented.
3448 Survival Analysis SurvCorr Correlation of Bivariate Survival Times Estimates correlation coefficients with associated confidence limits for bivariate, partially censored survival times. Uses the iterative multiple imputation approach proposed by Schemper, Kaider, Wakounig and Heinze, Statistics in Medicine 2013. Provides a scatterplot function to visualize the bivariate distribution, either on the original time scale or as copula.
3449 Survival Analysis survexp.fr Relative survival, AER and SMR based on French death rates Relative survival, AER and SMR based on French death rates
3450 Survival Analysis survey Analysis of Complex Survey Samples Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase subsampling designs. Graphics. PPS sampling without replacement. Principal components, factor analysis.
3451 Survival Analysis Survgini The Gini concentration test for survival data The Gini concentration test for survival data is a nonparametric test based on the Gini index for testing the equality of two survival distributions from the point of view of concentration. The package compares different nonparametric tests (asymptotic Gini test, permutation Gini test, log-rank test, Gray-Tsiatis test and Wilcoxon test) and computes their p-values.
3452 Survival Analysis survIDINRI IDI and NRI for comparing competing risk prediction models with censored survival data Performs inference for a class of measures to compare competing risk prediction models with censored survival data. The class includes the integrated discrimination improvement index (IDI) and category-less net reclassification index (NRI).
3453 Survival Analysis survival (core) Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
3454 Survival Analysis survivalMPL Penalised Maximum Likelihood for Survival Analysis Models Estimate the regression coefficients and the baseline hazard of proportional hazard Cox models using maximum penalised likelihood. A ‘non-parametric’ smooth estimate of the baseline hazard function is provided.
3455 Survival Analysis survivalROC Time-dependent ROC curve estimation from censored survival data Compute time-dependent ROC curve from censored survival data using Kaplan-Meier (KM) or Nearest Neighbor Estimation (NNE) method of Heagerty, Lumley & Pepe (Biometrics, Vol 56 No 2, 2000, PP 337-344)
3456 Survival Analysis survJamda Survival Prediction by Joint Analysis of Microarray Gene Expression Data Microarray gene expression data can be analyzed individually or jointly using merging methods or meta-analysis to predict patients’ survival and risk assessment.
3457 Survival Analysis SurvLong Analysis of Proportional Hazards Model with Sparse Longitudinal Covariates Kernel weighting methods for estimation of proportional hazards models with intermittently observed longitudinal covariates.
3458 Survival Analysis survminer Drawing Survival Curves using ‘ggplot2’ Contains the function ‘ggsurvplot()’ for drawing easily beautiful and ‘ready-to-publish’ survival curves with the ‘number at risk’ table and ‘censoring count plot’. Other functions are also available to plot adjusted curves for ‘Cox’ model and to visually examine ’Cox’ model assumptions.
3459 Survival Analysis survMisc Miscellaneous Functions for Survival Data A collection of functions to help in the analysis of right-censored survival data. These extend the methods available in package:survival.
3460 Survival Analysis survPresmooth Presmoothed Estimation in Survival Analysis Presmoothed estimators of survival, density, cumulative and non-cumulative hazard functions with right-censored survival data.
3461 Survival Analysis SurvRegCensCov Weibull Regression for a Right-Censored Endpoint with Interval-Censored Covariate The main function of this package allows estimation of a Weibull Regression for a right-censored endpoint, one interval-censored covariate, and an arbitrary number of non-censored covariates. Additional functions allow to switch between different parametrizations of Weibull regression used by different R functions, inference for the mean difference of two arbitrarily censored Normal samples, and estimation of canonical parameters from censored samples for several distributional assumptions.
3462 Survival Analysis survRM2 Comparing Restricted Mean Survival Time Performs two-sample comparisons using the restricted mean survival time (RMST) as a summary measure of the survival time distribution. Three kinds of between-group contrast metrics (i.e., the difference in RMST, the ratio of RMST and the ratio of the restricted mean time lost (RMTL)) are computed. It performs an ANCOVA-type covariate adjustment as well as unadjusted analyses for those measures.
3463 Survival Analysis survsim Simulation of Simple and Complex Survival Data Simulation of simple and complex survival data including recurrent and multiple events and competing risks.
3464 Survival Analysis survSNP Power Calculations for SNP Studies with Censored Outcomes Conduct asymptotic and empirical power and sample size calculations for Single-Nucleotide Polymorphism (SNP) association studies with right censored time to event outcomes.
3465 Survival Analysis SvyNom Nomograms for Right-Censored Outcomes from Survey Designs Builds, evaluates and validates a nomogram with survey data and right-censored outcomes.
3466 Survival Analysis TBSSurvival Survival Analysis using a Transform-Both-Sides Model Functions to perform the reliability/survival analysis using a parametric Transform-both-sides (TBS) model.
3467 Survival Analysis tdROC Nonparametric Estimation of Time-Dependent ROC Curve from Right Censored Survival Data Compute time-dependent ROC curve from censored survival data using nonparametric weight adjustments.
3468 Survival Analysis thregI Threshold Regression for Interval-Censored Data with a Cure Rate Option Fit a threshold regression model for Interval Censored Data based on the first-hitting-time of a boundary by the sample path of a Wiener diffusion process. The threshold regression methodology is well suited to applications involving survival and time-to-event data.
3469 Survival Analysis timereg (core) Flexible Regression Models for Survival Data Programs for Martinussen and Scheike (2006), ‘Dynamic Regression Models for Survival Data’, Springer Verlag. Plus more recent developments. Additive survival model, semiparametric proportional odds model, fast cumulative residuals, excess risk models and more. Flexible competing risks regression including GOF-tests. Two-stage frailty modelling. PLS for the additive risk model. Lasso in the ‘ahaz’ package.
3470 Survival Analysis timeROC Time-Dependent ROC Curve and AUC for Censored Survival Data Estimation of time-dependent ROC curve and area under time dependent ROC curve (AUC) in the presence of censored data, with or without competing risks. Confidence intervals of AUCs and tests for comparing AUCs of two rival markers measured on the same subjects can be computed, using the iid-representation of the AUC estimator. Plot functions for time-dependent ROC curves and AUC curves are provided. Time-dependent Positive Predictive Values (PPV) and Negative Predictive Values (NPV) can also be computed.
3471 Survival Analysis tlmec Linear Student-t Mixed-Effects Models with Censored Data Fit a linear mixed effects model for censored data with Student-t or normal distributions. The errors are assumed independent and identically distributed.
3472 Survival Analysis TP.idm Estimation of Transition Probabilities for the Illness-Death Model Estimation of transition probabilities for the illness-death model. Both the Aalen-Johansen estimator for a Markov model and a novel non-Markovian estimator by de Una-Alvarez and Meira-Machado (2015) <doi:10.1111/biom.12288>, see also Balboa and de Una-Alvarez (2018) <doi:10.18637/jss.v083.i10>, are included.
3473 Survival Analysis TPmsm Estimation of Transition Probabilities in Multistate Models Estimation of transition probabilities for the illness-death model and or the three-state progressive model.
3474 Survival Analysis tpr Temporal Process Regression Regression models for temporal process responses with time-varying coefficient.
3475 Survival Analysis TraMineR Trajectory Miner: a Toolbox for Exploring and Rendering Sequences Toolbox for the manipulation, description and rendering of sequences, and more generally the mining of sequence data in the field of social sciences. Although the toolbox is primarily intended for analyzing state or event sequences that describe life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and simple functions for extracting the most frequent subsequences and identifying the most discriminating ones among them. A user’s guide can be found on the TraMineR web page.
3476 Survival Analysis TransModel Fit Linear Transformation Models for Right Censored Data A unified estimation procedure for the analysis of right censored data using linear transformation models.
3477 Survival Analysis TSHRC Two Stage Hazard Rate Comparison Two-stage procedure compares hazard rate functions, which may or may not cross each other.
3478 Survival Analysis uniah Unimodal Additive Hazards Model Nonparametric estimation of a unimodal or U-shape covariate effect under additive hazards model.
3479 Survival Analysis VGAM Vector Generalized Linear and Additive Models An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book “Vector Generalized Linear and Additive Models: With an Implementation in R” (Yee, 2015) <doi:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (150+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)―these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.
3480 Survival Analysis vitality Fitting Routines for the Vitality Family of Mortality Models Provides fitting routines for four versions of the Vitality family of mortality models.
3481 Survival Analysis YPmodel The Short-Term and Long-Term Hazard Ratio Model for Survival Data Inference procedures accommodate a flexible range of hazard ratio patterns with a two-sample semi-parametric model. This model contains the proportional hazards model and the proportional odds model as sub-models, and accommodates non-proportional hazards situations to the extreme of having crossing hazards and crossing survivor functions. Overall, this package has four major functions: 1) the parameter estimation, namely short-term and long-term hazard ratio parameters; 2) 95 percent and 90 percent point-wise confidence intervals and simultaneous confidence bands for the hazard ratio function; 3) p-value of the adaptive weighted log-rank test; 4) p-values of two lack-of-fit tests for the model. See the included “read_me_first.pdf” for brief instructions. In this version (1.1), there is no need to sort the data before applying this package.
3482 Teaching Statistics ACSWR A Companion Package for the Book “A Course in Statistics with R” A book designed to meet the requirements of masters students. Tattar, P.N., Suresh, R., and Manjunath, B.G. “A Course in Statistics with R”, J. Wiley, ISBN 978-1-119-15272-9.
3483 Teaching Statistics AER Applied Econometrics with R Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York. ISBN 978-0-387-77316-2. (See the vignette “AER” for a package overview.)
3484 Teaching Statistics animation (core) A Gallery of Animations in Statistics and Utilities to Create Animations Provides functions for animations in statistics, covering topics in probability theory, mathematical statistics, multivariate statistics, non-parametric statistics, sampling survey, linear models, time series, computational statistics, data mining and machine learning. These functions may be helpful in teaching statistics and data analysis. Also provided in this package are a series of functions to save animations to various formats, e.g. Flash, ‘GIF’, HTML pages, ‘PDF’ and videos. ‘PDF’ animations can be inserted into ‘Sweave’ / ‘knitr’ easily.
3485 Teaching Statistics AtelieR A GTK GUI for teaching basic concepts in statistical inference, and doing elementary bayesian tests A collection of statistical simulation and computation tools with a GTK GUI, to help teach statistical concepts and compute probabilities. Two domains are covered: I. Understanding (Central-Limit Theorem and the Normal Distribution, Distribution of a sample mean, Distribution of a sample variance, Probability calculator for common distributions), and II. Elementary Bayesian Statistics (bayesian inference on proportions, contingency tables, means and variances, with informative and noninformative priors).
3486 Teaching Statistics BaM Functions and Datasets for Books by Jeff Gill Functions and datasets for Jeff Gill: “Bayesian Methods: A Social and Behavioral Sciences Approach”. First, Second, and Third Edition. Published by Chapman and Hall/CRC (2002, 2007, 2014).
3487 Teaching Statistics BayesDA (core) Functions and Datasets for the book “Bayesian Data Analysis” Functions for Bayesian Data Analysis, with datasets from the book “Bayesian data Analysis (second edition)” by Gelman, Carlin, Stern and Rubin. Not all datasets yet, hopefully completed soon.
3488 Teaching Statistics Bolstad Functions for Elementary Bayesian Inference A set of R functions and data sets for the book Introduction to Bayesian Statistics, Bolstad, W.M. (2017), John Wiley & Sons ISBN 978-1-118-09156-2.
3489 Teaching Statistics distrTeach Extensions of Package ‘distr’ for Teaching Stochastics/Statistics in Secondary School Provides flexible examples of LLN and CLT for teaching purposes in secondary school.
3490 Teaching Statistics ElemStatLearn Data Sets, Functions and Examples from the Book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman Useful when reading the book above mentioned, in the documentation referred to as ‘the book’.
3491 Teaching Statistics exams (core) Automatic Generation of Exams in R Automatic generation of exams based on exercises in Markdown or LaTeX format, possibly including R code for dynamic generation of exercise elements. Exercise types include single-choice and multiple-choice questions, arithmetic problems, string questions, and combinations thereof (cloze). Output formats include standalone files (PDF, HTML, Docx, ODT, …), Moodle XML, QTI 1.2 (for OLAT/OpenOLAT), QTI 2.1, Blackboard, ARSnova, and TCExam. In addition to fully customizable PDF exams, a standardized PDF format (NOPS) is provided that can be printed, scanned, and automatically evaluated.
3492 Teaching Statistics faraway Functions and Datasets for Books by Julian Faraway Books are “Practical Regression and ANOVA in R” on CRAN, “Linear Models with R” published 1st Ed. August 2004, 2nd Ed. July 2014 by CRC press, ISBN 9781439887332, and “Extending the Linear Model with R” published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248.
3493 Teaching Statistics gganimate A Grammar of Animated Graphics The grammar of graphics as implemented in the ‘ggplot2’ package has been successful in providing a powerful API for creating static visualisation. In order to extend the API for animated graphics this package provides a completely new set of grammar, fully compatible with ‘ggplot2’ for specifying transitions and animations in a flexible and extensible way.
3494 Teaching Statistics ggplot2 Create Elegant Data Visualisations Using the Grammar of Graphics A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
3495 Teaching Statistics HSAUR3 A Handbook of Statistical Analyses Using R (3rd Edition) Functions, data sets, analyses and examples from the third edition of the book ”A Handbook of Statistical Analyses Using R” (Torsten Hothorn and Brian S. Everitt, Chapman & Hall/CRC, 2014). The first chapter of the book, which is entitled ”An Introduction to R”, is completely included in this package, for all other chapters, a vignette containing all data analyses is available. In addition, Sweave source code for slides of selected chapters is included in this package (see HSAUR3/inst/slides). The publishers web page is ‘<http://www.crcpress.com/product/isbn/9781482204582>’.
3496 Teaching Statistics infer Tidy Statistical Inference The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.
3497 Teaching Statistics ISwR Introductory Statistics with R Data sets and scripts for text examples and exercises in P. Dalgaard (2008), ‘Introductory Statistics with R’, 2nd ed., Springer Verlag, ISBN 978-0387790534.
3498 Teaching Statistics LearnBayes Functions for Learning Bayesian Inference A collection of functions helpful in learning the basic tenets of Bayesian statistical inference. It contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.
3499 Teaching Statistics learnstats An Interactive Environment for Learning Statistics Allows students to use R as an interactive educational environment for statistical concepts, ranging from p-values to confidence intervals to stability in time series.
3500 Teaching Statistics MASS (core) Support Functions and Datasets for Venables and Ripley’s MASS Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).
3501 Teaching Statistics moderndive Tidyverse-Friendly Introductory Linear Regression Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in ModernDive: An Introduction to Statistical and Data Sciences via R available at <http://moderndive.com/> and DataCamp’s Modeling with Data in the Tidyverse available at <https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse>.
3502 Teaching Statistics mosaic (core) Project MOSAIC Statistics and Mathematics Teaching Utilities Data sets and utilities from Project MOSAIC (<http://mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.
3503 Teaching Statistics MPV Data Sets from Montgomery, Peck and Vining Most of this package consists of data sets from the textbook Introduction to Linear Regression Analysis (3rd ed), by Montgomery, Peck and Vining. Some additional data sets and functions are included.
3504 Teaching Statistics openintro Data Sets and Supplemental Functions from ‘OpenIntro’ Textbooks Supplemental functions and data for ‘OpenIntro’ resources, which includes open-source textbooks and resources for introductory statistics at <http://www.openintro.org>. The package contains data sets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
3505 Teaching Statistics ProfessR Grades Setting and Exam Maker Programs to determine student grades and create examinations from Question banks. Programs will create numerous multiple choice exams, randomly shuffled, for different versions of same question list.
3506 Teaching Statistics Rcmdr (core) R Commander A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
3507 Teaching Statistics RndTexExams Build and Grade Multiple Choice Exams with Randomized Content Using as input a ‘LaTeX’ file with a multiple choice exam, this package will produce several versions with randomized contents of the same exam. Functions for grading are also available.
3508 Teaching Statistics shiny Web Application Framework for R Makes it incredibly easy to build interactive web applications with R. Automatic “reactive” binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.
3509 Teaching Statistics Sleuth2 Data Sets from Ramsey and Schafer’s “Statistical Sleuth (2nd Ed)” Data sets from Ramsey, F.L. and Schafer, D.W. (2002), “The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)”, Duxbury.
3510 Teaching Statistics Sleuth3 Data Sets from Ramsey and Schafer’s “Statistical Sleuth (3rd Ed)” Data sets from Ramsey, F.L. and Schafer, D.W. (2013), “The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed)”, Cengage Learning.
3511 Teaching Statistics smovie Some Movies to Illustrate Concepts in Statistics Provides movies to help students to understand statistical concepts. The ‘rpanel’ package <https://cran.r-project.org/package=rpanel> is used to create interactive plots that move to illustrate key statistical ideas and methods. There are movies to: visualise probability distributions (including user-supplied ones); illustrate sampling distributions of the sample mean (central limit theorem), the sample maximum (extremal types theorem) and (the Fisher transformation of the) Pearson product moment correlation coefficient; examine the influence of an individual observation in simple linear regression; illustrate key concepts in statistical hypothesis testing. Also provided are dpqr functions for the distribution of the Fisher transformation of the correlation coefficient under sampling from a bivariate normal distribution.
3512 Teaching Statistics SMPracticals Practicals for Use with Davison (2003) Statistical Models Contains the datasets and a few functions for use with the practicals outlined in Appendix A of the book Statistical Models (Davison, 2003, Cambridge University Press). The practicals themselves can be found at <http://statwww.epfl.ch/davison/SM/>.
3513 Teaching Statistics swirl Learn R, in R Use the R console as an interactive learning environment. Users receive immediate feedback as they are guided through self-paced lessons in data science and R programming.
3514 Teaching Statistics swirlify A Toolbox for Writing ‘swirl’ Courses A set of tools for writing and sharing interactive courses to be used with swirl.
3515 Teaching Statistics TeachBayes Teaching Bayesian Inference Several functions for communicating Bayesian thinking including Bayes rule for deciding among spinners, visualizations for Bayesian inference for one proportion and for one mean, and comparison of two proportions using a discrete prior.
3516 Teaching Statistics TeachingDemos Demonstrations for Teaching and Learning Demonstration functions that can be used in a classroom to demonstrate statistical concepts, or on your own to better understand the concepts or the programming.
3517 Teaching Statistics TexExamRandomizer Personalizes and Randomizes Exams Written in ‘LaTeX’ Randomizing exams with ‘LaTeX’. If you can compile your main document with ‘LaTeX’, the program should be able to compile the randomized versions without much extra effort when creating the document.
3518 Teaching Statistics vcd Visualizing Categorical Data Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).
3519 Time Series Analysis acp Autoregressive Conditional Poisson Analysis of count data exhibiting autoregressive properties, using the Autoregressive Conditional Poisson model (ACP(p,q)) proposed by Heinen (2003).
3520 Time Series Analysis AER Applied Econometrics with R Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York. ISBN 978-0-387-77316-2. (See the vignette “AER” for a package overview.)
3521 Time Series Analysis anomalize Tidy Anomaly Detection The ‘anomalize’ package enables a “tidy” workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it’s quite simple to decompose time series, detect anomalies, and create bands separating the “normal” data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess (“stl”) and seasonal decomposition by piecewise medians (“twitter”). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range (“iqr”) and generalized extreme studentized deviation (“gesd”). These methods are based on those used in the ‘forecast’ package and the Twitter ‘AnomalyDetection’ package. Refer to the associated functions for specific references for these methods.
3522 Time Series Analysis ARCensReg Fitting Univariate Censored Linear Regression Model with Autoregressive Errors It fits an univariate left or right censored linear regression model with autoregressive errors under the normal distribution. It provides estimates and standard errors of the parameters, prediction of future observations and it supports missing values on the dependent variable. It also performs influence diagnostic through local influence for three possible perturbation schemes.
3523 Time Series Analysis ArDec Time series autoregressive-based decomposition Package ArDec implements autoregressive-based decomposition of a time series based on the constructive approach in West (1997). Particular cases include the extraction of trend and seasonal components.
3524 Time Series Analysis arfima Fractional ARIMA (and Other Long Memory) Time Series Modeling Simulates, fits, and predicts long-memory and anti-persistent time series, possibly mixed with ARMA, regression, transfer-function components. Exact methods (MLE, forecasting, simulation) are used. Bug reports should be done via GitHub (at <https://github.com/JQVeenstra/arfima>), where the development version of this package lives; it can be installed using devtools.
3525 Time Series Analysis ASSA Applied Singular Spectrum Analysis (ASSA) Functions to model and decompose time series into principal components using singular spectrum analysis (de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>; de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>).
3526 Time Series Analysis astsa Applied Statistical Time Series Analysis Included are data sets and scripts to accompany Time Series Analysis and Its Applications: With R Examples (4th ed), by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2017. <doi:10.1007/978-3-319-52452-8>.
3527 Time Series Analysis autovarCore Automated Vector Autoregression Models and Networks Automatically find the best vector autoregression models and networks for a given time series data set. ‘AutovarCore’ evaluates eight kinds of models: models with and without log transforming the data, lag 1 and lag 2 models, and models with and without weekday dummy variables. For each of these 8 model configurations, ‘AutovarCore’ evaluates all possible combinations for including outlier dummies (at 2.5x the standard deviation of the residuals) and retains the best model. Model evaluation includes the Eigenvalue stability test and a configurable set of residual tests. These eight models are further reduced to four models because ‘AutovarCore’ determines whether adding weekday dummies improves the model fit.
3528 Time Series Analysis BAYSTAR On Bayesian analysis of Threshold autoregressive model (BAYSTAR) The manuscript introduces the BAYSTAR package, which provides the functionality for Bayesian estimation in autoregressive threshold models.
3529 Time Series Analysis bentcableAR Bent-Cable Regression for Independent Data or Autoregressive Time Series Included are two main interfaces for fitting and diagnosing bent-cable regressions for autoregressive time-series data or independent data (time series or otherwise): ‘bentcable.ar()’ and ‘bentcable.dev.plot()’. Some components in the package can also be used as stand-alone functions. The bent cable (linear-quadratic-linear) generalizes the broken stick (linear-linear), which is also handled by this package. Version 0.2 corrects a glitch in the computation of confidence intervals for the CTP. References that were updated from Versions 0.2.1 and 0.2.2 appear in Version 0.2.3 and up. Version 0.3.0 improves robustness of the error-message producing mechanism. It is the author’s intention to distribute any future updates via GitHub.
3530 Time Series Analysis BETS Brazilian Economic Time Series It provides access to and information about the most important Brazilian economic time series - from the Getulio Vargas Foundation <http://portal.fgv.br/en>, the Central Bank of Brazil <http://www.bcb.gov.br> and the Brazilian Institute of Geography and Statistics <http://www.ibge.gov.br>. It also presents tools for managing, analysing (e.g. generating dynamic reports with a complete analysis of a series) and exporting these time series.
3531 Time Series Analysis bfast Breaks For Additive Season and Trend (BFAST) BFAST integrates the decomposition of time series into trend, seasonal, and remainder components with methods for detecting and characterizing abrupt changes within the trend and seasonal components. BFAST can be used to analyze different types of satellite image time series and can be applied to other disciplines dealing with seasonal or non-seasonal time series, such as hydrology, climatology, and econometrics. The algorithm can be extended to label detected changes with information on the parameters of the fitted piecewise linear models. BFAST monitoring functionality is added based on a paper that has been submitted to Remote Sensing of Environment. BFAST monitor provides functionality to detect disturbance in near real-time based on BFAST-type models. BFAST approach is flexible approach that handles missing data without interpolation. Furthermore now different models can be used to fit the time series data and detect structural changes (breaks).
3532 Time Series Analysis bigtime Sparse Estimation of Large Time Series Models Estimation of large Vector AutoRegressive (VAR), Vector AutoRegressive with Exogenous Variables X (VARX) and Vector AutoRegressive Moving Average (VARMA) Models with Structured Lasso Penalties, see Nicholson, Bien and Matteson (2017) <arXiv:1412.5250v2> and Wilms, Basu, Bien and Matteson (2017) <arXiv:1707.09208>.
3533 Time Series Analysis BigVAR Dimension Reduction Methods for Multivariate Time Series Estimates VAR and VARX models with structured Lasso Penalties.
3534 Time Series Analysis biwavelet Conduct Univariate and Bivariate Wavelet Analyses This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform univariate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) analyses.
3535 Time Series Analysis BNPTSclust A Bayesian Nonparametric Algorithm for Time Series Clustering Performs the algorithm for time series clustering described in Nieto-Barajas and Contreras-Cristan (2014).
3536 Time Series Analysis boot Bootstrap Functions (Originally by Angelo Canty for S) Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.
3537 Time Series Analysis BootPR Bootstrap Prediction Intervals and Bias-Corrected Forecasting Bias-Corrected Forecasting and Bootstrap Prediction Intervals for Autoregressive Time Series
3538 Time Series Analysis brainwaver Basic wavelet analysis of multivariate time series with a visualisation and parametrisation using graph theory This package computes the correlation matrix for each scale of a wavelet decomposition, namely the one performed by the R package waveslim (Whitcher, 2000). An hypothesis test is applied to each entry of one matrix in order to construct an adjacency matrix of a graph. The graph obtained is finally analysed using the small-world theory (Watts and Strogatz, 1998) and using the computation of efficiency (Latora, 2001), tested using simulated attacks. The brainwaver project is complementary to the camba project for brain-data preprocessing. A collection of scripts (with a makefile) is avalaible to download along with the brainwaver package, see information on the webpage mentioned below.
3539 Time Series Analysis bspec Bayesian Spectral Inference Bayesian inference on the (discrete) power spectrum of time series.
3540 Time Series Analysis bssm Bayesian Inference of Non-Linear and Non-Gaussian State Space Models Efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo and parallel importance sampling type weighted Markov chain Monte Carlo (Vihola, Helske, and Franks, 2017, <arXiv:1609.02541>). Gaussian, Poisson, binomial, or negative binomial observation densities and basic stochastic volatility models with Gaussian state dynamics, as well as general non-linear Gaussian models and discretised diffusion models are supported.
3541 Time Series Analysis bsts Bayesian Structural Time Series Time series regression using dynamic linear models fit using MCMC. See Scott and Varian (2014) <doi:10.1504/IJMMNO.2014.059942>, among many other sources.
3542 Time Series Analysis CADFtest A Package to Perform Covariate Augmented Dickey-Fuller Unit Root Tests Hansen’s (1995) Covariate-Augmented Dickey-Fuller (CADF) test. The only required argument is y, the Tx1 time series to be tested. If no stationary covariate X is passed to the procedure, then an ordinary ADF test is performed. The p-values of the test are computed using the procedure illustrated in Lupi (2009).
3543 Time Series Analysis carfima Continuous-Time Fractionally Integrated ARMA Process for Irregularly Spaced Long-Memory Time Series Data We provide a toolbox to fit a continuous-time fractionally integrated ARMA process (CARFIMA) on univariate and irregularly spaced time series data via frequentist or Bayesian machinery. A general-order CARFIMA(p, H, q) model for p>q is specified in Tsai and Chan (2005) <doi:10.1111/j.1467-9868.2005.00522.x> and it involves (p+q+2) unknown model parameters, i.e., p AR parameters, q MA parameters, Hurst parameter H, and process uncertainty (standard deviation) sigma. The package produces their maximum likelihood estimates and asymptotic uncertainties using a global optimizer called the differential evolution algorithm. It also produces their posterior distributions via Metropolis within a Gibbs sampler equipped with adaptive Markov chain Monte Carlo for posterior sampling. These fitting procedures, however, may produce numerical errors if p>2. The toolbox also contains a function to simulate discrete time series data from CARFIMA(p, H, q) process given the model parameters and observation times.
3544 Time Series Analysis carx Censored Autoregressive Model with Exogenous Covariates A censored time series class is designed. An estimation procedure is implemented to estimate the Censored AutoRegressive time series with eXogenous covariates (CARX), assuming normality of the innovations. Some other functions that might be useful are also included.
3545 Time Series Analysis cents Censored time series Fit censored time series
3546 Time Series Analysis changepoint Methods for Changepoint Detection Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean(), cpt.var(), cpt.meanvar() functions should be your first point of call.
3547 Time Series Analysis changepoint.mv Changepoint Analysis for Multivariate Time Series Detects the Most Recent Changepoints (mrc) for panel data consisting of many related univariate timeseries using the method developed by Bardwell, Fearnhead, Eckley, Smith and Spott (2018) <doi:10.1080/00401706.2018.1438926>.
3548 Time Series Analysis chron Chronological Objects which can Handle Dates and Times Provides chronological objects which can handle dates and times.
3549 Time Series Analysis cointReg Parameter Estimation and Inference in a Cointegrating Regression Cointegration methods are widely used in empirical macroeconomics and empirical finance. It is well known that in a cointegrating regression the ordinary least squares (OLS) estimator of the parameters is super-consistent, i.e. converges at rate equal to the sample size T. When the regressors are endogenous, the limiting distribution of the OLS estimator is contaminated by so-called second order bias terms, see e.g. Phillips and Hansen (1990) <doi:10.2307/2297545>. The presence of these bias terms renders inference difficult. Consequently, several modifications to OLS that lead to zero mean Gaussian mixture limiting distributions have been proposed, which in turn make standard asymptotic inference feasible. These methods include the fully modified OLS (FM-OLS) approach of Phillips and Hansen (1990) <doi:10.2307/2297545>, the dynamic OLS (D-OLS) approach of Phillips and Loretan (1991) <doi:10.2307/2298004>, Saikkonen (1991) <doi:10.1017/S0266466600004217> and Stock and Watson (1993) <doi:10.2307/2951763> and the new estimation approach called integrated modified OLS (IM-OLS) of Vogelsang and Wagner (2014) <doi:10.1016/j.jeconom.2013.10.015>. The latter is based on an augmented partial sum (integration) transformation of the regression model. IM-OLS is similar in spirit to the FM- and D-OLS approaches, with the key difference that it does not require estimation of long run variance matrices and avoids the need to choose tuning parameters (kernels, bandwidths, lags). However, inference does require that a long run variance be scaled out. This package provides functions for the parameter estimation and inference with all three modified OLS approaches. That includes the automatic bandwidth selection approaches of Andrews (1991) <doi:10.2307/2938229> and of Newey and West (1994) <doi:10.2307/2297912> as well as the calculation of the long run variance.
3550 Time Series Analysis CommonTrend Extract and plot common trends from a cointegration system. Calculate P-value for Johansen Statistics Directly extract and plot stochastic common trends from a cointegration system using different approaches, currently including Kasa (1992) and Gonzalo and Granger (1995). The approach proposed by Gonzalo and Granger, also known as Permanent-Transitory Decomposition, is widely used in macroeconomics and market microstructure literature. Kasa’s approach, on the other hand, has a nice property that it only uses the super consistent estimator: the cointegration vector ‘beta’. This package also provides functions calculate P-value from Johansen Statistics according to the approximation method proposed by Doornik (1998). Update: 0.7-1: Fix bugs in calculation alpha. Add formulas and more explanations. 0.6-1: Rewrite the description file. 0.5-1: Add functions to calculate P-value from Johansen statistic, and vice versa.
3551 Time Series Analysis costat Time Series Costationarity Determination Contains functions that can determine whether a time series is second-order stationary or not (and hence evidence for locally stationarity). Given two non-stationary series (i.e. locally stationary series) this package can then discover time-varying linear combinations that are second-order stationary.
3552 Time Series Analysis cts Continuous Time Autoregressive Models Functions to fit continuous time autoregressive models with the Kalman filter (Wang (2013) <doi:10.18637/jss.v053.i05>).
3553 Time Series Analysis dataseries Switzerland’s Data Series in One Place Download and import time series from <http://www.dataseries.org>, a comprehensive and up-to-date collection of open data from Switzerland.
3554 Time Series Analysis dCovTS Distance Covariance and Correlation for Time Series Analysis Computing and plotting the distance covariance and correlation function of a univariate or a multivariate time series. Both versions of biased and unbiased estimators of distance covariance and correlation are provided. Test statistics for testing pairwise independence are also implemented. Some data sets are also included.
3555 Time Series Analysis depmix Dependent Mixture Models Fits (multigroup) mixtures of latent or hidden Markov models on mixed categorical and continuous (timeseries) data. The Rdonlp2 package can optionally be used for optimization of the log-likelihood and is available from R-forge.
3556 Time Series Analysis depmixS4 Dependent Mixture Models - Hidden Markov Models of GLMs and Other Distributions in S4 Fits latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models.
3557 Time Series Analysis deseasonalize Optimal deseasonalization for geophysical time series using AR fitting Deseasonalize daily or monthly time series.
3558 Time Series Analysis dLagM Time Series Regression Models with Distributed Lag Models Provides time series regression models with one predictor using finite distributed lag models, polynomial (Almon) distributed lag models, geometric distributed lag models with Koyck transformation, and autoregressive distributed lag models. It also consists of functions for computation of h-step ahead forecasts from these models. See Baltagi (2011) <doi:10.1007/978-3-642-20059-5> for more information.
3559 Time Series Analysis dlm Bayesian and Likelihood Analysis of Dynamic Linear Models Provides routines for Maximum likelihood, Kalman filtering and smoothing, and Bayesian analysis of Normal linear State Space models, also known as Dynamic Linear Models.
3560 Time Series Analysis dlnm Distributed Lag Non-Linear Models Collection of functions for distributed lag linear and non-linear models.
3561 Time Series Analysis dsa Seasonal Adjustment of Daily Time Series Seasonal- and calendar adjustment of time series with daily frequency using the DSA approach developed by Ollech, Daniel (2018): Seasonal adjustment of daily time series. Bundesbank Discussion Paper 41/2018.
3562 Time Series Analysis dse Dynamic Systems Estimation (Time Series Package) Tools for multivariate, linear, time-invariant, time series models. This includes ARMA and state-space representations, and methods for converting between them. It also includes simulation methods and several estimation functions. The package has functions for looking at model roots, stability, and forecasts at different horizons. The ARMA model representation is general, so that VAR, VARX, ARIMA, ARMAX, ARIMAX can all be considered to be special cases. Kalman filter and smoother estimates can be obtained from the state space model, and state-space model reduction techniques are implemented. An introduction and User’s Guide is available in a vignette.
3563 Time Series Analysis dtw Dynamic Time Warping Algorithms A comprehensive implementation of dynamic time warping (DTW) algorithms in R. DTW computes the optimal (least cumulative distance) alignment between points of two time series. Common DTW variants covered include local (slope) and global (window) constraints, subsequence matches, arbitrary distance definitions, normalizations, minimum variance matching, and so on. Provides cumulative distances, alignments, specialized plot styles, etc.
3564 Time Series Analysis dtwclust Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.
3565 Time Series Analysis dygraphs Interface to ‘Dygraphs’ Interactive Time Series Charting Library An R interface to the ‘dygraphs’ JavaScript charting library (a copy of which is included in the package). Provides rich facilities for charting time-series data in R, including highly configurable series- and axis-display and interactive features like zoom/pan and series/point highlighting.
3566 Time Series Analysis dyn Time Series Regression Time series regression. The dyn class interfaces ts, irts(), zoo() and zooreg() time series classes to lm(), glm(), loess(), quantreg::rq(), MASS::rlm(), MCMCpack::MCMCregress(), quantreg::rq(), randomForest::randomForest() and other regression functions allowing those functions to be used with time series including specifications that may contain lags, diffs and missing values.
3567 Time Series Analysis dynlm Dynamic Linear Regression Dynamic linear models and time series regression.
3568 Time Series Analysis earlywarnings Early Warning Signals Toolbox for Detecting Critical Transitions in Timeseries The Early-Warning-Signals Toolbox provides methods for estimating statistical changes in timeseries that can be used for identifying nearby critical transitions. Based on Dakos et al (2012) Methods for Detecting Early Warnings of Critical Transitions in Time Series Illustrated Using Simulated Ecological Data. PLoS ONE 7(7):e41010
3569 Time Series Analysis Ecdat Data Sets for Econometrics Data sets for econometrics.
3570 Time Series Analysis ecm Build Error Correction Models Functions for easy building of error correction models (ECM) for time series regression.
3571 Time Series Analysis ecp Non-Parametric Multiple Change-Point Analysis of Multivariate Data Implements various procedures for finding multiple change-points. Two methods make use of dynamic programming and pruning, with no distributional assumptions other than the existence of certain absolute moments in one method. Hierarchical and exact search methods are included. All methods return the set of estimated change- points as well as other summary information.
3572 Time Series Analysis EMD Empirical Mode Decomposition and Hilbert Spectral Analysis For multiscale analysis, this package carries out empirical mode decomposition and Hilbert spectral analysis. For usage of EMD, see Kim and Oh, 2009 (Kim, D and Oh, H.-S. (2009) EMD: A Package for Empirical Mode Decomposition and Hilbert Spectrum, The R Journal, 1, 40-46).
3573 Time Series Analysis ensembleBMA Probabilistic Forecasting using Ensembles and Bayesian Model Averaging Bayesian Model Averaging to create probabilistic forecasts from ensemble forecasts and weather observations.
3574 Time Series Analysis EvalEst Dynamic Systems Estimation - Extensions Provides functions for evaluating (time series) model estimation methods. These facilitate Monte Carlo experiments of repeated simulations and estimations. Also provides methods for looking at the distribution of the results from these experiments, including model roots (which are an equivalence class invariant).
3575 Time Series Analysis events Store and manipulate event data Stores, manipulates, aggregates and otherwise messes with event data from KEDS/TABARI or any other extraction tool with similar output
3576 Time Series Analysis expsmooth Data Sets from “Forecasting with Exponential Smoothing” Data sets from the book “Forecasting with exponential smoothing: the state space approach” by Hyndman, Koehler, Ord and Snyder (Springer, 2008).
3577 Time Series Analysis factorstochvol Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models with interweaving <doi:10.1080/10618600.2017.1322091>. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix <doi:10.1016/j.jeconom.2018.11.007>.
3578 Time Series Analysis fame Interface for FAME Time Series Database Read and write FAME databases.
3579 Time Series Analysis fanplot Visualisation of Sequential Probability Distributions Using Fan Charts Visualise sequential distributions using a range of plotting styles. Sequential distribution data can be input as either simulations or values corresponding to percentiles over time. Plots are added to existing graphic devices using the fan function. Users can choose from four different styles, including fan chart type plots, where a set of coloured polygon, with shadings corresponding to the percentile values are layered to represent different uncertainty levels.
3580 Time Series Analysis FeedbackTS Analysis of Feedback in Time Series Analysis of fragmented time directionality to investigate feedback in time series. Tools provided by the package allow the analysis of feedback for a single time series and the analysis of feedback for a set of time series collected across a spatial domain.
3581 Time Series Analysis fGarch Rmetrics - Autoregressive Conditional Heteroskedastic Modelling Provides a collection of functions to analyze and model heteroskedastic behavior in financial time series models.
3582 Time Series Analysis FinTS Companion to Tsay (2005) Analysis of Financial Time Series R companion to Tsay (2005) Analysis of Financial Time Series, second edition (Wiley). Includes data sets, functions and script files required to work some of the examples. Version 0.3-x includes R objects for all data files used in the text and script files to recreate most of the analyses in chapters 1-3 and 9 plus parts of chapters 4 and 11.
3583 Time Series Analysis FitAR Subset AR Model Fitting Comprehensive model building function for identification, estimation and diagnostic checking for AR and subset AR models. Two types of subset AR models are supported. One family of subset AR models, denoted by ARp, is formed by taking subet of the original AR coefficients and in the other, denoted by ARz, subsets of the partial autocorrelations are used. The main advantage of the ARz model is its applicability to very large order models.
3584 Time Series Analysis FitARMA Fit ARMA or ARIMA Using Fast MLE Algorithm Implements fast maximum likelihood algorithm for fitting ARMA time series. Uses S3 methods print, summary, fitted, residuals. Fast exact Gaussian ARMA simulation.
3585 Time Series Analysis FKF Fast Kalman Filter This is a fast and flexible implementation of the Kalman filter, which can deal with NAs. It is entirely written in C and relies fully on linear algebra subroutines contained in BLAS and LAPACK. Due to the speed of the filter, the fitting of high-dimensional linear state space models to large datasets becomes possible. This package also contains a plot function for the visualization of the state vector and graphical diagnostics of the residuals.
3586 Time Series Analysis fma Data Sets from “Forecasting: Methods and Applications” by Makridakis, Wheelwright & Hyndman (1998) All data sets from “Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman (Wiley, 3rd ed., 1998).
3587 Time Series Analysis fNonlinear Rmetrics - Nonlinear and Chaotic Time Series Modelling Provides a collection of functions for testing various aspects of univariate time series including independence and neglected nonlinearities. Further provides functions to investigate the chaotic behavior of time series processes and to simulate different types of chaotic time series maps.
3588 Time Series Analysis ForeCA Forecastable Component Analysis Implementation of Forecastable Component Analysis (‘ForeCA’), including main algorithms and auxiliary function (summary, plotting, etc.) to apply ‘ForeCA’ to multivariate time series data. ‘ForeCA’ is a novel dimension reduction (DR) technique for temporally dependent signals. Contrary to other popular DR methods, such as ‘PCA’ or ‘ICA’, ‘ForeCA’ takes time dependency explicitly into account and searches for the most ”forecastable” signal. The measure of forecastability is based on the Shannon entropy of the spectral density of the transformed signal.
3589 Time Series Analysis forecast (core) Forecasting Functions for Time Series and Linear Models Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
3590 Time Series Analysis ForecastComb Forecast Combination Methods Provides geometric- and regression-based forecast combination methods under a unified user interface for the packages ‘ForecastCombinations’ and ‘GeomComb’. Additionally, updated tools and convenience functions for data pre-processing are available in order to deal with common problems in forecast combination (missingness, collinearity). For method details see Hsiao C, Wan SK (2014). <doi:10.1016/j.jeconom.2013.11.003>, Hansen BE (2007). <doi:10.1111/j.1468-0262.2007.00785.x>, Elliott G, Gargano A, Timmermann A (2013). <doi:10.1016/j.jeconom.2013.04.017>, and Clemen RT (1989). <doi:10.1016/0169-2070(89)90012-5>.
3591 Time Series Analysis forecastHybrid Convenient Functions for Ensemble Time Series Forecasts Convenient functions for ensemble forecasts in R combining approaches from the ‘forecast’ package. Forecasts generated from auto.arima(), ets(), thetaf(), nnetar(), stlm(), tbats(), and snaive() can be combined with equal weights, weights based on in-sample errors (introduced by Bates & Granger (1969) <doi:10.1057/jors.1969.103>), or cross-validated weights. Cross validation for time series data with user-supplied models and forecasting functions is also supported to evaluate model accuracy.
3592 Time Series Analysis forecTheta Forecasting Time Series by Theta Models Routines for forecasting univariate time series using Theta Models. Contains several cross-validation routines.
3593 Time Series Analysis fpp Data for “Forecasting: principles and practice” All data sets required for the examples and exercises in the book “Forecasting: principles and practice” by Rob J Hyndman and George Athanasopoulos. All packages required to run the examples are also loaded.
3594 Time Series Analysis fpp2 Data for “Forecasting: Principles and Practice” (2nd Edition) All data sets required for the examples and exercises in the book “Forecasting: principles and practice” by Rob J Hyndman and George Athanasopoulos <https://OTexts.org/fpp2/>. All packages required to run the examples are also loaded.
3595 Time Series Analysis fracdiff Fractionally differenced ARIMA aka ARFIMA(p,d,q) models Maximum likelihood estimation of the parameters of a fractionally differenced ARIMA(p,d,q) model (Haslett and Raftery, Appl.Statistics, 1989).
3596 Time Series Analysis fractal A Fractal Time Series Modeling and Analysis Package Stochastic fractal and deterministic chaotic time series analysis.
3597 Time Series Analysis fractalrock Generate fractal time series with non-normal returns distribution The basic principle driving fractal generation of time series is that data is generated iteratively based on increasing levels of resolution. The initial series is defined by a so-called initiator pattern and then generators are used to replace each segment of the initial pattern. Regular, repeatable patterns can be produced by using the same seed and generators. By using a set of generators, non-repeatable time series can be produced. This technique is the basis of the fractal time series process in this package.
3598 Time Series Analysis freqdom Frequency Domain Based Analysis: Dynamic PCA Implementation of dynamic principal component analysis (DPCA), simulation of VAR and VMA processes and frequency domain tools. These frequency domain methods for dimensionality reduction of multivariate time series were introduced by David Brillinger in his book Time Series (1974). We follow implementation guidelines as described in Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.
3599 Time Series Analysis freqdom.fda Functional Time Series: Dynamic Functional Principal Components Implementations of functional dynamic principle components analysis. Related graphic tools and frequency domain methods. These methods directly use multivariate dynamic principal components implementation, following the guidelines from Hormann, Kidzinski and Hallin (2016), Dynamic Functional Principal Component <doi:10.1111/rssb.12076>.
3600 Time Series Analysis fts R Interface to ‘tslib’ (a Time Series Library in C++) Fast operations for time series objects.
3601 Time Series Analysis ftsa Functional Time Series Analysis Functions for visualizing, modeling, forecasting and hypothesis testing of functional time series.
3602 Time Series Analysis funtimes Functions for Time Series Analysis Includes non-parametric estimators and tests for time series analysis. The functions are to test for presence of possibly non-monotonic trends and for synchronism of trends in multiple time series, using modern bootstrap techniques and robust non-parametric difference-based estimators.
3603 Time Series Analysis GAS Generalized Autoregressive Score Models Simulate, estimate and forecast using univariate and multivariate GAS models as described in Ardia et al. (2019) <doi:10.18637/jss.v088.i06>.
3604 Time Series Analysis gdpc Generalized Dynamic Principal Components Functions to compute the Generalized Dynamic Principal Components introduced in Pena and Yohai (2016) <doi:10.1080/01621459.2015.1072542>.
3605 Time Series Analysis ggseas ‘stats’ for Seasonal Adjustment on the Fly with ‘ggplot2’ Provides ‘ggplot2’ ‘stats’ that estimate seasonally adjusted series and rolling summaries such as rolling average on the fly for time series.
3606 Time Series Analysis ggTimeSeries Time Series Visualisations Using the Grammar of Graphics Provides additional display mediums for time series visualisations, such as calendar heat map, steamgraph, marimekko, etc.
3607 Time Series Analysis glarma Generalized Linear Autoregressive Moving Average Models Functions are provided for estimation, testing, diagnostic checking and forecasting of generalized linear autoregressive moving average (GLARMA) models for discrete valued time series with regression variables. These are a class of observation driven non-linear non-Gaussian state space models. The state vector consists of a linear regression component plus an observation driven component consisting of an autoregressive-moving average (ARMA) filter of past predictive residuals. Currently three distributions (Poisson, negative binomial and binomial) can be used for the response series. Three options (Pearson, score-type and unscaled) for the residuals in the observation driven component are available. Estimation is via maximum likelihood (conditional on initializing values for the ARMA process) optimized using Fisher scoring or Newton Raphson iterative methods. Likelihood ratio and Wald tests for the observation driven component allow testing for serial dependence in generalized linear model settings. Graphical diagnostics including model fits, autocorrelation functions and probability integral transform residuals are included in the package. Several standard data sets are included in the package.
3608 Time Series Analysis GMDH Short Term Forecasting via GMDH-Type Neural Network Algorithms Group method of data handling (GMDH) - type neural network algorithm is the heuristic self-organization method for modelling the complex systems. In this package, GMDH-type neural network algorithms are applied to make short term forecasting for a univariate time series.
3609 Time Series Analysis gmvarkit Estimate Gaussian Mixture Vector Autoregressive Model Maximum likelihood estimation of Gaussian Mixture Vector Autoregressive (GMVAR) model, quantile residual tests, graphical diagnostics, forecasting and simulations. Applying general linear constraints to the autoregressive parameters is supported. Leena Kalliovirta, Mika Meitz, Pentti Saikkonen (2016) <doi:10.1016/j.jeconom.2016.02.012>.
3610 Time Series Analysis GNAR Methods for Fitting Network Time Series Models Simulation of, and fitting models for, Generalised Network Autoregressive (GNAR) time series models which take account of network structure. Such models are described in Knight et al. (2016), see <arXiv:1603.03221>.
3611 Time Series Analysis graphicalVAR Graphical VAR for Experience Sampling Data Estimates within and between time point interactions in experience sampling data, using the Graphical vector autoregression model in combination with regularization. See also Epskamp, Waldorp, Mottus & Borsboom (2018) <doi:10.1080/00273171.2018.1454823>.
3612 Time Series Analysis gsarima Two functions for Generalized SARIMA time series simulation Write SARIMA models in (finite) AR representation and simulate generalized multiplicative seasonal autoregressive moving average (time) series with Normal / Gaussian, Poisson or negative binomial distribution.
3613 Time Series Analysis gtop Game-Theoretically OPtimal (GTOP) Reconciliation Method In hierarchical time series (HTS) forecasting, the hierarchical relation between multiple time series is exploited to make better forecasts. This hierarchical relation implies one or more aggregate consistency constraints that the series are known to satisfy. Many existing approaches, like for example bottom-up or top-down forecasting, therefore attempt to achieve this goal in a way that guarantees that the forecasts will also be aggregate consistent. This package provides with an implementation of the Game-Theoretically OPtimal (GTOP) reconciliation method proposed in van Erven and Cugliari (2015), which is guaranteed to only improve any given set of forecasts. This opens up new possibilities for constructing the forecasts. For example, it is not necessary to assume that bottom-level forecasts are unbiased, and aggregate forecasts may be constructed by regressing both on bottom-level forecasts and on other covariates that may only be available at the aggregate level.
3614 Time Series Analysis HarmonicRegression Harmonic Regression to One or more Time Series Fits the first harmonics in a Fourier expansion to one or more time series. Trend elimination can be performed. Computed values include estimates of amplitudes and phases, as well as confidence intervals and p-values for the null hypothesis of Gaussian noise.
3615 Time Series Analysis hht The Hilbert-Huang Transform: Tools and Methods Builds on the EMD package to provide additional tools for empirical mode decomposition (EMD) and Hilbert spectral analysis. It also implements the ensemble empirical decomposition (EEMD) and the complete ensemble empirical mode decomposition (CEEMD) methods to avoid mode mixing and intermittency problems found in EMD analysis. The package comes with several plotting methods that can be used to view intrinsic mode functions, the HHT spectrum, and the Fourier spectrum.
3616 Time Series Analysis hts Hierarchical and Grouped Time Series Provides methods for analysing and forecasting hierarchical and grouped time series. The available forecast methods include bottom-up, top-down, optimal combination reconciliation (Hyndman et al. 2011) <doi:10.1016/j.csda.2011.03.006>, and trace minimization reconciliation (Wickramasuriya et al. 2018) <doi:10.1080/01621459.2018.1448825>.
3617 Time Series Analysis hwwntest Tests of White Noise using Wavelets Provides methods to test whether time series is consistent with white noise.
3618 Time Series Analysis imputePSF Impute Missing Data in Time Series Data with PSF Based Method Imputes the missing values in time series data with PSF algorithm based approach. The details about PSF algorithm are available at: <https://cran.r-project.org/package=PSF>.
3619 Time Series Analysis imputeTestbench Test Bench for the Comparison of Imputation Methods Provides a test bench for the comparison of missing data imputation methods in uni-variate time series. Imputation methods are compared using different error metrics. Proposed imputation methods and alternative error metrics can be used.
3620 Time Series Analysis imputeTS Time Series Missing Value Imputation Imputation (replacement) of missing values in univariate time series. Offers several imputation functions and missing data plots. Available imputation algorithms include: ‘Mean’, ‘LOCF’, ‘Interpolation’, ‘Moving Average’, ‘Seasonal Decomposition’, ‘Kalman Smoothing on Structural Time Series models’, ‘Kalman Smoothing on ARIMA models’.
3621 Time Series Analysis influxdbr R Interface to InfluxDB An R interface to the InfluxDB time series database <https://www.influxdata.com>. This package allows you to fetch and write time series data from/to an InfluxDB server. Additionally, handy wrappers for the Influx Query Language (IQL) to manage and explore a remote database are provided.
3622 Time Series Analysis InspectChangepoint High-Dimensional Changepoint Estimation via Sparse Projection Provides a data-driven projection-based method for estimating changepoints in high-dimensional time series. Multiple changepoints are estimated using a (wild) binary segmentation scheme.
3623 Time Series Analysis itsmr Time Series Analysis Using the Innovations Algorithm Provides functions for modeling and forecasting time series data. Forecasting is based on the innovations algorithm. A description of the innovations algorithm can be found in the textbook “Introduction to Time Series and Forecasting” by Peter J. Brockwell and Richard A. Davis. <http://www.springer.com/us/book/9781475777505>.
3624 Time Series Analysis jmotif Time Series Analysis Toolkit Based on Symbolic Aggregate Dicretization, i.e. SAX Implements time series z-normalization, SAX, HOT-SAX, VSM, SAX-VSM, RePair, and RRA algorithms facilitating time series motif (i.e., recurrent pattern), discord (i.e., anomaly), and characteristic pattern discovery along with interpretable time series classification.
3625 Time Series Analysis KFAS Kalman Filter and Smoother for Exponential Family State Space Models State space modelling is an efficient and flexible framework for statistical inference of a broad class of time series and other data. KFAS includes computationally efficient functions for Kalman filtering, smoothing, forecasting, and simulation of multivariate exponential family state space models, with observations from Gaussian, Poisson, binomial, negative binomial, and gamma distributions. See the paper by Helske (2017) <doi:10.18637/jss.v078.i10> for details.
3626 Time Series Analysis KFKSDS Kalman Filter, Smoother and Disturbance Smoother Naive implementation of the Kalman filter, smoother and disturbance smoother for state space models.
3627 Time Series Analysis kza Kolmogorov-Zurbenko Adaptive Filters Time Series Analysis including break detection, spectral analysis, KZ Fourier Transforms.
3628 Time Series Analysis locits Test of Stationarity and Localized Autocovariance Provides test of second-order stationarity for time series (for dyadic and arbitrary-n length data). Provides localized autocovariance, with confidence intervals, for locally stationary (nonstationary) time series.
3629 Time Series Analysis lomb Lomb-Scargle Periodogram Computes the Lomb-Scargle Periodogram for unevenly sampled time series. Includes a randomization procedure to obtain reliable p-values.
3630 Time Series Analysis LongMemoryTS Long Memory Time Series Long Memory Time Series is a collection of functions for estimation, simulation and testing of long memory processes, spurious long memory processes and fractionally cointegrated systems.
3631 Time Series Analysis LPStimeSeries Learned Pattern Similarity and Representation for Time Series Learned Pattern Similarity (LPS) for time series. Implements a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local auto-patterns. Generates a pattern-based representation of time series along with a similarity measure called Learned Pattern Similarity (LPS). Introduces a generalized autoregressive kernel.This package is based on the ‘randomForest’ package by Andy Liaw.
3632 Time Series Analysis ltsa Linear Time Series Analysis Methods of developing linear time series modelling. Methods are given for loglikelihood computation, forecasting and simulation.
3633 Time Series Analysis lubridate Make Dealing with Dates a Little Easier Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The ‘lubridate’ package has a consistent and memorable syntax that makes working with dates easy and fun. Parts of the ‘CCTZ’ source code, released under the Apache 2.0 License, are included in this package. See <https://github.com/google/cctz> for more details.
3634 Time Series Analysis mafs Multiple Automatic Forecast Selection Fits several forecast models available from the forecast package and selects the best one according to an error metric. Its main function is select_forecast().
3635 Time Series Analysis MAPA Multiple Aggregation Prediction Algorithm Functions and wrappers for using the Multiple Aggregation Prediction Algorithm (MAPA) for time series forecasting. MAPA models and forecasts time series at multiple temporal aggregation levels, thus strengthening and attenuating the various time series components for better holistic estimation of its structure. For details see Kourentzes et al. (2014) <doi:10.1016/j.ijforecast.2013.09.006>.
3636 Time Series Analysis mAr Multivariate AutoRegressive analysis R functions for multivariate autoregressive analysis
3637 Time Series Analysis mar1s Multiplicative AR(1) with Seasonal Processes Multiplicative AR(1) with Seasonal is a stochastic process model built on top of AR(1). The package provides the following procedures for MAR(1)S processes: fit, compose, decompose, advanced simulate and predict.
3638 Time Series Analysis MARSS Multivariate Autoregressive State-Space Modeling The MARSS package provides maximum-likelihood parameter estimation for constrained and unconstrained linear multivariate autoregressive state-space (MARSS) models fit to multivariate time-series data. Fitting is primarily via an Expectation-Maximization (EM) algorithm, although fitting via the BFGS algorithm (using the optim function) is also provided. MARSS models are a class of dynamic linear model (DLM) and vector autoregressive model (VAR) model. Functions are provided for parametric and innovations bootstrapping, Kalman filtering and smoothing, bootstrap model selection criteria (AICb), confidences intervals via the Hessian approximation and via bootstrapping and calculation of auxiliary residuals for detecting outliers and shocks. The user guide shows examples of using MARSS for parameter estimation for a variety of applications, model selection, dynamic factor analysis, outlier and shock detection, and addition of covariates. Type RShowDoc(“UserGuide”, package=“MARSS”) at the R command line to open the MARSS user guide. Online workshops (lectures and computer labs) at <https://nwfsc-timeseries.github.io/> See the NEWS file for update information.
3639 Time Series Analysis mclcar Estimating Conditional Auto-Regressive (CAR) Models using Monte Carlo Likelihood Methods The likelihood of direct CAR models and Binomial and Poisson GLM with latent CAR variables are approximated by the Monte Carlo likelihood. The Maximum Monte Carlo likelihood estimator is found either by an iterative procedure of directly maximising the Monte Carlo approximation or by a response surface design method.Reference for the method can be found in the DPhil thesis in Z. Sha (2016). For application a good reference is R.Bivand et.al (2017) <doi:10.1016/j.spasta.2017.01.002>.
3640 Time Series Analysis Mcomp Data from the M-Competitions The 1001 time series from the M-competition (Makridakis et al. 1982) <doi:10.1002/for.3980010202> and the 3003 time series from the IJF-M3 competition (Makridakis and Hibon, 2000) <doi:10.1016/S0169-2070(00)00057-1>.
3641 Time Series Analysis meboot Maximum Entropy Bootstrap for Time Series Maximum entropy density based dependent data bootstrap. An algorithm is provided to create a population of time series (ensemble) without assuming stationarity. The reference paper (Vinod, H.D., 2004) explains how the algorithm satisfies the ergodic theorem and the central limit theorem.
3642 Time Series Analysis mFilter Miscellaneous Time Series Filters The mFilter package implements several time series filters useful for smoothing and extracting trend and cyclical components of a time series. The routines are commonly used in economics and finance, however they should also be interest to other areas. Currently, Christiano-Fitzgerald, Baxter-King, Hodrick-Prescott, Butterworth, and trigonometric regression filters are included in the package.
3643 Time Series Analysis mgm Estimating Time-Varying k-Order Mixed Graphical Models Estimation of k-Order time-varying Mixed Graphical Models and mixed VAR(p) models via elastic-net regularized neighborhood regression. For details see linked paper.
3644 Time Series Analysis mlVAR Multi-Level Vector Autoregression Estimates the multi-level vector autoregression model on time-series data. Three network structures are obtained: temporal networks, contemporaneous networks and between-subjects networks.
3645 Time Series Analysis mondate Keep track of dates in terms of months Keep track of dates in terms of months. Model dates as at close of business. Perform date arithmetic in units of “months” and “years” (multiples of months). Allow “infinite” dates to model “ultimate” time spans.
3646 Time Series Analysis MSwM Fitting Markov Switching Models Estimation, inference and diagnostics for Univariate Autoregressive Markov Switching Models for Linear and Generalized Models. Distributions for the series include gaussian, Poisson, binomial and gamma cases. The EM algorithm is used for estimation (see Perlin (2012) <doi:10.2139/ssrn.1714016>).
3647 Time Series Analysis MTS All-Purpose Toolkit for Analyzing Multivariate Time Series (MTS) and Estimating Multivariate Volatility Models Multivariate Time Series (MTS) is a general package for analyzing multivariate linear time series and estimating multivariate volatility models. It also handles factor models, constrained factor models, asymptotic principal component analysis commonly used in finance and econometrics, and principal volatility component analysis. (a) For the multivariate linear time series analysis, the package performs model specification, estimation, model checking, and prediction for many widely used models, including vector AR models, vector MA models, vector ARMA models, seasonal vector ARMA models, VAR models with exogenous variables, multivariate regression models with time series errors, augmented VAR models, and Error-correction VAR models for co-integrated time series. For model specification, the package performs structural specification to overcome the difficulties of identifiability of VARMA models. The methods used for structural specification include Kronecker indices and Scalar Component Models. (b) For multivariate volatility modeling, the MTS package handles several commonly used models, including multivariate exponentially weighted moving-average volatility, Cholesky decomposition volatility models, dynamic conditional correlation (DCC) models, copula-based volatility models, and low-dimensional BEKK models. The package also considers multiple tests for conditional heteroscedasticity, including rank-based statistics. (c) Finally, the MTS package also performs forecasting using diffusion index, transfer function analysis, Bayesian estimation of VAR models, and multivariate time series analysis with missing values.Users can also use the package to simulate VARMA models, to compute impulse response functions of a fitted VARMA model, and to calculate theoretical cross-covariance matrices of a given VARMA model.
3648 Time Series Analysis mtsdi Multivariate Time Series Data Imputation This is an EM algorithm based method for imputation of missing values in multivariate normal time series. The imputation algorithm accounts for both spatial and temporal correlation structures. Temporal patterns can be modeled using an ARIMA(p,d,q), optionally with seasonal components, a non-parametric cubic spline or generalized additive models with exogenous covariates. This algorithm is specially tailored for climate data with missing measurements from several monitors along a given region.
3649 Time Series Analysis multDM Multivariate Version of the Diebold-Mariano Test Allows to perform the multivariate version of the Diebold-Mariano test for equal predictive ability of multiple forecast comparison. Main reference: Mariano, R.S., Preve, D. (2012) <doi:10.1016/j.jeconom.2012.01.014>.
3650 Time Series Analysis MultipleBubbles Test and Detection of Explosive Behaviors for Time Series Provides the Augmented Dickey-Fuller test and its variations to check the existence of bubbles (explosive behavior) for time series, based on the article by Peter C. B. Phillips, Shuping Shi and Jun Yu (2015a) <doi:10.1111/iere.12131>. Some functions may take a while depending on the size of the data used, or the number of Monte Carlo replications applied.
3651 Time Series Analysis multitaper Spectral Analysis Tools using the Multitaper Method Implements multitaper spectral analysis using discrete prolate spheroidal sequences (Slepians) and sine tapers. It includes an adaptive weighted multitaper spectral estimate, a coherence estimate, Thomson’s Harmonic F-test, and complex demodulation. The Slepians sequences are generated efficiently using a tridiagonal matrix solution, and jackknifed confidence intervals are available for most estimates.
3652 Time Series Analysis mvcwt Wavelet Analysis of Multiple Time Series Computes the continuous wavelet transform of irregularly sampled time series.
3653 Time Series Analysis mvLSW Multivariate, Locally Stationary Wavelet Process Estimation Tools for analysing multivariate time series with wavelets. This includes: simulation of a multivariate locally stationary wavelet (mvLSW) process from a multivariate evolutionary wavelet spectrum (mvEWS); estimation of the mvEWS, local coherence and local partial coherence. See Park, Eckley and Ombao (2014) <doi:10.1109/TSP.2014.2343937> for details.
3654 Time Series Analysis nardl Nonlinear Cointegrating Autoregressive Distributed Lag Model Computes the nonlinear cointegrating autoregressive distributed lag model with p lags of the dependent variables and q lags of independent variables proposed by (Shin, Yu & Greenwood-Nimmo, 2014 <doi:10.1007/978-1-4899-8008-3_9>).
3655 Time Series Analysis nets Network Estimation for Time Series Sparse VAR estimation based on LASSO.
3656 Time Series Analysis NlinTS Non Linear Time Series Analysis Models for non-linear time series analysis and causality detection. The main functionalities of this package consist of an implementation of the classical causality test (C.W.J.Granger 1980) <doi:10.1016/0165-1889(80)90069-X>, and a non-linear version of it based on feed-forward neural networks. This package contains also an implementation of the Transfer Entropy <doi:10.1103/PhysRevLett.85.461>, and the continuous Transfer Entropy using an approximation based on the k-nearest neighbors <doi:10.1103/PhysRevE.69.066138>. There are also some other useful tools, like the VARNN (Vector Auto-Regressive Neural Network) prediction model, the Augmented test of stationarity, and the discrete and continuous entropy and mutual information.
3657 Time Series Analysis nlts Nonlinear Time Series Analysis R functions for (non)linear time series analysis with an emphasis on nonparametric autoregression and order estimation, and tests for linearity / additivity.
3658 Time Series Analysis nnfor Time Series Forecasting with Neural Networks Automatic time series modelling with neural networks. Allows fully automatic, semi-manual or fully manual specification of networks. For details of the specification methodology see: (i) Crone and Kourentzes (2010) <doi:10.1016/j.neucom.2010.01.017>; and (ii) Kourentzes et al. (2014) <doi:10.1016/j.eswa.2013.12.011>.
3659 Time Series Analysis nonlinearTseries Nonlinear Time Series Analysis Functions for nonlinear time series analysis. This package permits the computation of the most-used nonlinear statistics/algorithms including generalized correlation dimension, information dimension, largest Lyapunov exponent, sample entropy and Recurrence Quantification Analysis (RQA), among others. Basic routines for surrogate data testing are also included. Part of this work was based on the book “Nonlinear time series analysis” by Holger Kantz and Thomas Schreiber (ISBN: 9780521529020).
3660 Time Series Analysis npst Generalization of Hewitt’s Seasonality Test Package ‘npst’ generalizes Hewitt’s (1971) test for seasonality and Rogerson’s (1996) extension based on Monte-Carlo simulation.
3661 Time Series Analysis NTS Nonlinear Time Series Analysis Simulation, estimation, prediction procedure, and model identification methods for nonlinear time series analysis, including threshold autoregressive models, Markov-switching models, convolutional functional autoregressive models, nonlinearity tests, Kalman filters and various sequential Monte Carlo methods. More examples and details about this package can be found in the book “Nonlinear Time Series Analysis” by Ruey S. Tsay and Rong Chen, Wiley, 2018 (ISBN: 978-1-119-26407-1).
3662 Time Series Analysis odpc One-Sided Dynamic Principal Components Functions to compute the one-sided dynamic principal components (‘odpc’) introduced in Smucler, Pena and Yohai (2018) <doi:10.1080/01621459.2018.1520117>. ‘odpc’ is a novel dimension reduction technique for multivariate time series, that is useful for forecasting. These dynamic principal components are defined as the linear combinations of the present and past values of the series that minimize the reconstruction mean squared error.
3663 Time Series Analysis opera Online Prediction by Expert Aggregation Misc methods to form online predictions, for regression-oriented time-series, by combining a finite set of forecasts provided by the user.
3664 Time Series Analysis orderedLasso Ordered Lasso and Time-Lag Sparse Regression Ordered lasso and time-lag sparse regression. Ordered Lasso fits a linear model and imposes an order constraint on the coefficients. It writes the coefficients as positive and negative parts, and requires positive parts and negative parts are non-increasing and positive. Time-Lag Lasso generalizes the ordered Lasso to a general data matrix with multiple predictors. For more details, see Suo, X.,Tibshirani, R., (2014) ‘An Ordered Lasso and Sparse Time-lagged Regression’.
3665 Time Series Analysis otsad Online Time Series Anomaly Detectors Implements a set of online fault detectors for time-series, called: PEWMA see M. Carter et al. (2012) <doi:10.1109/SSP.2012.6319708>, SD-EWMA and TSSD-EWMA see H. Raza et al. (2015) <doi:10.1016/j.patcog.2014.07.028>, KNN-CAD see E. Burnaev et al. (2016) <arXiv:1608.04585>, KNN-LDCD see V. Ishimtsev et al. (2017) <arXiv:1706.03412> and CAD-OSE see M. Smirnov (2018) <https://github.com/smirmik/CAD>. The first three algorithms belong to prediction-based techniques and the last three belong to window-based techniques. In addition, the SD-EWMA and PEWMA algorithms are algorithms designed to work in stationary environments, while the other four are algorithms designed to work in non-stationary environments.
3666 Time Series Analysis paleoTS Analyze Paleontological Time-Series Facilitates analysis of paleontological sequences of trait values. Functions are provided to fit, using maximum likelihood, simple evolutionary models (including unbiased random walks, directional evolution,stasis, Ornstein-Uhlenbeck, covariate-tracking) and complex models (punctuation, mode shifts).
3667 Time Series Analysis partsm Periodic Autoregressive Time Series Models This package performs basic functions to fit and predict periodic autoregressive time series models. These models are discussed in the book P.H. Franses (1996) “Periodicity and Stochastic Trends in Economic Time Series”, Oxford University Press. Data set analyzed in that book is also provided. NOTE: the package was orphaned during several years. It is now only maintained, but no major enhancement are expected, and the maintainer cannot provide any support.
3668 Time Series Analysis pastecs Package for Analysis of Space-Time Ecological Series Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>;) initiative to bring PASSTEC 2000 functionalities to R.
3669 Time Series Analysis PCA4TS Segmenting Multiple Time Series by Contemporaneous Linear Transformation To seek for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially.
3670 Time Series Analysis pcdpca Dynamic Principal Components for Periodically Correlated Functional Time Series Method extends multivariate and functional dynamic principal components to periodically correlated multivariate time series. This package allows you to compute true dynamic principal components in the presence of periodicity. We follow implementation guidelines as described in Kidzinski, Kokoszka and Jouzdani (2017), in Principal component analysis of periodically correlated functional time series <arXiv:1612.00040>.
3671 Time Series Analysis pdc Permutation Distribution Clustering Permutation Distribution Clustering is a clustering method for time series. Dissimilarity of time series is formalized as the divergence between their permutation distributions. The permutation distribution was proposed as measure of the complexity of a time series.
3672 Time Series Analysis pdfetch Fetch Economic and Financial Time Series Data from Public Sources Download economic and financial time series from public sources, including the St Louis Fed’s FRED system, Yahoo Finance, the US Bureau of Labor Statistics, the US Energy Information Administration, the World Bank, Eurostat, the European Central Bank, the Bank of England, the UK’s Office of National Statistics, Deutsche Bundesbank, and INSEE.
3673 Time Series Analysis pear Package for Periodic Autoregression Analysis Package for estimating periodic autoregressive models. Datasets: monthly ozone and Fraser riverflow. Plots: periodic versions of boxplot, auto/partial correlations, moving-average expansion.
3674 Time Series Analysis perARMA Periodic Time Series Analysis Identification, model fitting and estimation for time series with periodic structure. Additionally procedures for simulation of periodic processes and real data sets are included.
3675 Time Series Analysis pomp Statistical Inference for Partially Observed Markov Processes Tools for working with partially observed Markov process (POMP) models (also known as stochastic dynamical systems, hidden Markov models, and nonlinear, non-Gaussian, state-space models). The package provides facilities for implementing POMP models, simulating them, and fitting them to time series data by a variety of frequentist and Bayesian methods. It is also a versatile platform for implementation of inference methods for general POMP models.
3676 Time Series Analysis portes Portmanteau Tests for Univariate and Multivariate Time. Series Models Simulate a univariate and multivariate data from seasonal and nonseasonal time series models. It implements the popular univariate and multivariate portmanteau test statistics based on the asymptotic distributions and the Monte Carlo significance tests.
3677 Time Series Analysis prophet Automatic Forecasting Procedure Implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
3678 Time Series Analysis psd Adaptive, Sine-Multitaper Power Spectral Density Estimation Produces power spectral density estimates through iterative refinement of the optimal number of sine-tapers at each frequency. This optimization procedure is based on the method of Riedel and Sidorenko (1995), which minimizes the Mean Square Error (sum of variance and bias) at each frequency, but modified for computational stability.
3679 Time Series Analysis PSF Forecasting of Univariate Time Series Using the Pattern Sequence-Based Forecasting (PSF) Algorithm Pattern Sequence Based Forecasting (PSF) takes univariate time series data as input and assist to forecast its future values. This algorithm forecasts the behavior of time series based on similarity of pattern sequences. Initially, clustering is done with the labeling of samples from database. The labels associated with samples are then used for forecasting the future behaviour of time series data. The further technical details and references regarding PSF are discussed in Vignette.
3680 Time Series Analysis ptw Parametric Time Warping Parametric Time Warping aligns patterns, i.e. it aims to put corresponding features at the same locations. The algorithm searches for an optimal polynomial describing the warping. It is possible to align one sample to a reference, several samples to the same reference, or several samples to several references. One can choose between calculating individual warpings, or one global warping for a set of samples and one reference. Two optimization criteria are implemented: RMS (Root Mean Square error) and WCC (Weighted Cross Correlation). Both warping of peak profiles and of peak lists are supported.
3681 Time Series Analysis Quandl API Wrapper for Quandl.com Functions for interacting directly with the Quandl API to offer data in a number of formats usable in R, downloading a zip with all data from a Quandl database, and the ability to search. This R package uses the Quandl API. For more information go to <https://www.quandl.com/docs/api>. For more help on the package itself go to <https://www.quandl.com/help/r>.
3682 Time Series Analysis quantspec Quantile-Based Spectral Analysis of Time Series Methods to determine, smooth and plot quantile periodograms for univariate and multivariate time series.
3683 Time Series Analysis regspec Non-Parametric Bayesian Spectrum Estimation for Multirate Data Computes linear Bayesian spectral estimates from multirate data for second-order stationary time series. Provides credible intervals and methods for plotting various spectral estimates.
3684 Time Series Analysis RGENERATE Tools to Generate Vector Time Series A method ‘generate()’ is implemented in this package for the random generation of vector time series according to models obtained by ‘RMAWGEN’, ‘vars’ or other packages. This package was created to generalize the algorithms of the ‘RMAWGEN’ package for the analysis and generation of any environmental vector time series.
3685 Time Series Analysis RJDemetra Interface to ‘JDemetra+’ Seasonal Adjustment Software Interface around ‘JDemetra+’ (<https://github.com/jdemetra/jdemetra-app>), the seasonal adjustment software officially recommended to the members of the European Statistical System (ESS) and the European System of Central Banks. It offers full access to all options and outputs of ‘JDemetra+’, including the two leading seasonal adjustment methods TRAMO/SEATS+ and X-12ARIMA/X-13ARIMA-SEATS.
3686 Time Series Analysis Rlgt Bayesian Exponential Smoothing Models with Trend Modifications An implementation of a number of Global Trend models for time series forecasting that are Bayesian generalizations and extensions of some Exponential Smoothing models. The main differences/additions include 1) nonlinear global trend, 2) Student-t error distribution, and 3) a function for the error size, so heteroscedasticity. The methods are particularly useful for short time series. When tested on the well-known M3 dataset, they are able to outperform all classical time series algorithms. The models are fitted with MCMC using the ‘rstan’ package.
3687 Time Series Analysis Rlibeemd Ensemble Empirical Mode Decomposition (EEMD) and Its Complete Variant (CEEMDAN) An R interface for libeemd (Luukko, Helske, Rasanen, 2016) <doi:10.1007/s00180-015-0603-9>, a C library of highly efficient parallelizable functions for performing the ensemble empirical mode decomposition (EEMD), its complete variant (CEEMDAN), the regular empirical mode decomposition (EMD), and bivariate EMD (BEMD). Due to the possible portability issues CRAN version no longer supports OpenMP, you can install OpenMP-supported version from GitHub: <https://github.com/helske/Rlibeemd/>.
3688 Time Series Analysis rmaf Refined Moving Average Filter Uses refined moving average filter based on the optimal and data-driven moving average lag q or smoothing spline to estimate trend and seasonal components, as well as irregularity (residuals) for univariate time series or data.
3689 Time Series Analysis RMAWGEN Multi-Site Auto-Regressive Weather GENerator S3 and S4 functions are implemented for spatial multi-site stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental “Gaussianized” time series through the ‘vars’ package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.
3690 Time Series Analysis robets Forecasting Time Series with Robust Exponential Smoothing We provide an outlier robust alternative of the function ets() in the ‘forecast’ package of Hyndman and Khandakar (2008) <doi:10.18637/jss.v027.i03>. For each method of a class of exponential smoothing variants we made a robust alternative. The class includes methods with a damped trend and/or seasonal components. The robust method is developed by robustifying every aspect of the original exponential smoothing variant. We provide robust forecasting equations, robust initial values, robust smoothing parameter estimation and a robust information criterion. The method is described in more detail in Crevits and Croux (2016) <doi:10.13140/RG.2.2.11791.18080>.
3691 Time Series Analysis robfilter Robust Time Series Filters A set of functions to filter time series based on concepts from robust statistics.
3692 Time Series Analysis robustarima Robust ARIMA Modeling Functions for fitting a linear regression model with ARIMA errors using a filtered tau-estimate.
3693 Time Series Analysis roll Rolling Statistics Fast and efficient computation of rolling statistics for time-series data.
3694 Time Series Analysis rollRegres Fast Rolling and Expanding Window Linear Regression Methods for fast rolling and expanding linear regression models. That is, series of linear regression models estimated on either an expanding window of data or a moving window of data. The methods use rank-one updates and downdates of the upper triangular matrix from a QR decomposition (see Dongarra, Moler, Bunch, and Stewart (1979) <doi:10.1137/1.9781611971811>).
3695 Time Series Analysis RSEIS Seismic Time Series Analysis Tools Multiple interactive codes to view and analyze seismic data, via spectrum analysis, wavelet transforms, particle motion, hodograms. Includes general time-series tools, plotting, filtering, interactive display.
3696 Time Series Analysis Rssa A Collection of Methods for Singular Spectrum Analysis Methods and tools for Singular Spectrum Analysis including decomposition, forecasting and gap-filling for univariate and multivariate time series.
3697 Time Series Analysis RTransferEntropy Measuring Information Flow Between Time Series with Shannon and Renyi Transfer Entropy Measuring information flow between time series with Shannon and Renyi transfer entropy. See also Dimpfl and Peter (2013) <doi:10.1515/snde-2012-0044> and Dimpfl and Peter (2014) <doi:10.1016/j.intfin.2014.03.004> for theory and applications to financial time series. Additional references can be found in the theory part of the vignette.
3698 Time Series Analysis rts Raster Time Series Analysis This framework aims to provide classes and methods for manipulating and processing of raster time series data (e.g. a time series of satellite images).
3699 Time Series Analysis rucrdtw R Bindings for the UCR Suite R bindings for functions from the UCR Suite by Rakthanmanon et al. (2012) <doi:10.1145/2339530.2339576>, which enables ultrafast subsequence search for a best match under Dynamic Time Warping and Euclidean Distance.
3700 Time Series Analysis rugarch Univariate GARCH Models ARFIMA, in-mean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.
3701 Time Series Analysis runstats Fast Computation of Running Statistics for Time Series Provides methods for fast computation of running sample statistics for time series. These include: (1) mean, (2) standard deviation, and (3) variance over a fixed-length window of time-series, (4) correlation, (5) covariance, and (6) Euclidean distance (L2 norm) between short-time pattern and time-series. Implemented methods utilize Convolution Theorem to compute convolutions via Fast Fourier Transform (FFT).
3702 Time Series Analysis rwt Rice Wavelet Toolbox wrapper Provides a set of functions for performing digital signal processing.
3703 Time Series Analysis sae2 Small Area Estimation: Time-series Models Time series models for small area estimation based on area-level models.
3704 Time Series Analysis scoringRules Scoring Rules for Parametric and Simulated Distribution Forecasts Dictionary-like reference for computing scoring rules in a wide range of situations. Covers both parametric forecast distributions (such as mixtures of Gaussians) and distributions generated via simulation.
3705 Time Series Analysis SDD Serial Dependence Diagrams Allows for computing (and by default plotting) different types of serial dependence diagrams.
3706 Time Series Analysis sde Simulation and Inference for Stochastic Differential Equations Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 978-0-387-75838-1, Springer, NY.
3707 Time Series Analysis seas Seasonal Analysis and Graphics, Especially for Climatology Capable of deriving seasonal statistics, such as “normals”, and analysis of seasonal data, such as departures. This package also has graphics capabilities for representing seasonal data, including boxplots for seasonal parameters, and bars for summed normals. There are many specific functions related to climatology, including precipitation normals, temperature normals, cumulative precipitation departures and precipitation interarrivals. However, this package is designed to represent any time-varying parameter with a discernible seasonal signal, such as found in hydrology and ecology.
3708 Time Series Analysis season Seasonal Analysis of Health Data Routines for the seasonal analysis of health data, including regression models, time-stratified case-crossover, plotting functions and residual checks, see Barnett and Dobson (2010) ISBN 978-3-642-10748-1. Thanks to Yuming Guo for checking the case-crossover code.
3709 Time Series Analysis seasonal R Interface to X-13-ARIMA-SEATS Easy-to-use interface to X-13-ARIMA-SEATS, the seasonal adjustment software by the US Census Bureau. It offers full access to almost all options and outputs of X-13, including X-11 and SEATS, automatic ARIMA model search, outlier detection and support for user defined holiday variables, such as Chinese New Year or Indian Diwali. A graphical user interface can be used through the ‘seasonalview’ package. Uses the X-13-binaries from the ‘x13binary’ package.
3710 Time Series Analysis seasonalview Graphical User Interface for Seasonal Adjustment A graphical user interface to the ‘seasonal’ package and ‘X-13ARIMA-SEATS’, the U.S. Census Bureau’s seasonal adjustment software. Unifies the code base of <http://www.seasonal.website> and the GUI in the ‘seasonal’ package.
3711 Time Series Analysis Sim.DiffProc Simulation of Diffusion Processes It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDE’s. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
3712 Time Series Analysis sleekts 4253H, Twice Smoothing Compute Time series Resistant Smooth 4253H, twice smoothing method.
3713 Time Series Analysis smooth Forecasting Using State Space Models Functions implementing Single Source of Error state space models for purposes of time series analysis and forecasting. The package includes Exponential Smoothing, SARIMA, Complex Exponential Smoothing, Simple Moving Average, Vector Exponential Smoothing in state space forms, several simulation functions and intermittent demand state space models.
3714 Time Series Analysis sparsevar A Package for Sparse VAR/VECM Estimation A wrapper for sparse VAR/VECM time series models estimation using penalties like ENET, SCAD and MCP.
3715 Time Series Analysis spectral Common Methods of Spectral Data Analysis Fourier and Hilbert transforms are utilized to perform several types of spectral analysis on the supplied data. Also fragmented and irregularly spaced data can be processed. A user friendly interface helps to interpret the results.
3716 Time Series Analysis spectral.methods Singular Spectrum Analysis (SSA) Tools for Time Series Analysis Contains some implementations of Singular Spectrum Analysis (SSA) for the gapfilling and spectral decomposition of time series. It contains the code used by Buttlar et. al. (2014), Nonlinear Processes in Geophysics. In addition, the iterative SSA gapfilling method of Kondrashov and Ghil (2006) is implemented. All SSA calculations are done via the truncated and fast SSA algorithm of Korobeynikov (2010) (package ‘Rssa’).
3717 Time Series Analysis spTimer Spatio-Temporal Bayesian Modelling Fits, spatially predicts and temporally forecasts large amounts of space-time data using [1] Bayesian Gaussian Process (GP) Models, [2] Bayesian Auto-Regressive (AR) Models, and [3] Bayesian Gaussian Predictive Processes (GPP) based AR Models for spatio-temporal big-n problems. Bakar and Sahu (2015) <doi:10.18637/jss.v063.i15>.
3718 Time Series Analysis stlplus Enhanced Seasonal Decomposition of Time Series by Loess Decompose a time series into seasonal, trend, and remainder components using an implementation of Seasonal Decomposition of Time Series by Loess (STL) that provides several enhancements over the STL method in the stats package. These enhancements include handling missing values, providing higher order (quadratic) loess smoothing with automated parameter choices, frequency component smoothing beyond the seasonal and trend components, and some basic plot methods for diagnostics.
3719 Time Series Analysis stochvol Efficient Bayesian Inference for Stochastic Volatility (SV) Models Efficient algorithms for fully Bayesian estimation of stochastic volatility (SV) models via Markov chain Monte Carlo (MCMC) methods. Methodological details are given in Kastner and Fruhwirth-Schnatter (2014) <doi:10.1016/j.csda.2013.01.002>; the most common use cases are described in Kastner (2016) <doi:10.18637/jss.v069.i05>. Also incorporates SV with leverage.
3720 Time Series Analysis stR STR Decomposition Methods for decomposing seasonal data: STR (a Seasonal-Trend decomposition procedure based on Regression) and Robust STR. In some ways, STR is similar to Ridge Regression and Robust STR can be related to LASSO. They allow for multiple seasonal components, multiple linear covariates with constant, flexible and seasonal influence. Seasonal patterns (for both seasonal components and seasonal covariates) can be fractional and flexible over time; moreover they can be either strictly periodic or have a more complex topology. The methods provide confidence intervals for the estimated components. The methods can be used for forecasting.
3721 Time Series Analysis strucchange Testing, Monitoring, and Dating Structural Changes Testing, monitoring and dating structural changes in (linear) regression models. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
3722 Time Series Analysis stsm Structural Time Series Models Fit the basic structural time series model by maximum likelihood.
3723 Time Series Analysis stsm.class Class and Methods for Structural Time Series Models This package defines an S4 class for structural time series models and provides some basic methods to work with it.
3724 Time Series Analysis sugrrants Supporting Graphs for Analysing Time Series Provides ‘ggplot2’ graphics for analysing time series data. It aims to fit into the ‘tidyverse’ and grammar of graphics framework for handling temporal data.
3725 Time Series Analysis surveillance Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hohle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hohle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.
3726 Time Series Analysis svars Data-Driven Identification of SVAR Models Implements data-driven identification methods for structural vector autoregressive (SVAR) models. Based on an existing VAR model object (provided by e.g. VAR() from the ‘vars’ package), the structural impact matrix is obtained via data-driven identification techniques (i.e. changes in volatility (Rigobon, R. (2003) <doi:10.1162/003465303772815727>), least dependent innovations (Herwartz, H., Ploedt, M., (2016) <doi:10.1016/j.jimonfin.2015.11.001>) or non-Gaussian maximum likelihood (Lanne, M., Meitz, M., Saikkonen, P. (2017) <doi:10.1016/j.jeconom.2016.06.002>).
3727 Time Series Analysis sweep Tidy Tools for Forecasting Tidies up the forecasting modeling and prediction work flow, extends the ‘broom’ package with ‘sw_tidy’, ‘sw_glance’, ‘sw_augment’, and ‘sw_tidy_decomp’ functions for various forecasting models, and enables converting ‘forecast’ objects to “tidy” data frames with ‘sw_sweep’.
3728 Time Series Analysis sym.arma Autoregressive and Moving Average Symmetric Models Functions for fitting the Autoregressive and Moving Average Symmetric Model for univariate time series introduced by Maior and Cysneiros (2018), <doi:10.1007/s00362-016-0753-z>. Fitting method: conditional maximum likelihood estimation. For details see: Wei (2006), Time Series Analysis: Univariate and Multivariate Methods, Section 7.2.
3729 Time Series Analysis tbrf Time-Based Rolling Functions Provides rolling statistical functions based on date and time windows instead of n-lagged observations.
3730 Time Series Analysis Tcomp Data from the 2010 Tourism Forecasting Competition The 1311 time series from the tourism forecasting competition conducted in 2010 and described in Athanasopoulos et al. (2011) <doi:10.1016/j.ijforecast.2010.04.009>.
3731 Time Series Analysis TED Turbulence Time Series Event Detection and Classification TED performs Turbulence time series Event Detection and classification.
3732 Time Series Analysis tempdisagg Methods for Temporal Disaggregation and Interpolation of Time Series Temporal disaggregation methods are used to disaggregate and interpolate a low frequency time series to a higher frequency series, where either the sum, the average, the first or the last value of the resulting high frequency series is consistent with the low frequency series. Temporal disaggregation can be performed with or without one or more high frequency indicator series. Contains the methods of Chow-Lin, Santos-Silva-Cardoso, Fernandez, Litterman, Denton and Denton-Cholette.
3733 Time Series Analysis tframe Time Frame Coding Kernel A kernel of functions for programming time series methods in a way that is relatively independently of the representation of time. Also provides plotting, time windowing, and some other utility functions which are specifically intended for time series. See the Guide distributed as a vignette, or ?tframe.Intro for more details. (User utilities are in package tfplot.)
3734 Time Series Analysis thief Temporal Hierarchical Forecasting Methods and tools for generating forecasts at different temporal frequencies using a hierarchical time series approach.
3735 Time Series Analysis tibbletime Time Aware Tibbles Built on top of the ‘tibble’ package, ‘tibbletime’ is an extension that allows for the creation of time aware tibbles. Some immediate advantages of this include: the ability to perform time-based subsetting on tibbles, quickly summarising and aggregating results by time periods, and creating columns that can be used as ‘dplyr’ time-based groups.
3736 Time Series Analysis Tides Quasi-Periodic Time Series Characteristics Calculate Characteristics of Quasi-Periodic Time Series, e.g. Estuarine Water Levels.
3737 Time Series Analysis tiger TIme series of Grouped ERrors Temporally resolved groups of typical differences (errors) between two time series are determined and visualized
3738 Time Series Analysis timechange Efficient Changing of Date-Times Efficient routines for manipulation of date-time objects while accounting for time-zones and daylight saving times. The package includes utilities for updating of date-time components (year, month, day etc.), modification of time-zones, rounding of date-times, period addition and subtraction etc. Parts of the ‘CCTZ’ source code, released under the Apache 2.0 License, are included in this package. See <https://github.com/google/cctz> for more details.
3739 Time Series Analysis timeDate Rmetrics - Chronological and Calendar Objects The ‘timeDate’ class fulfils the conventions of the ISO 8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards it provides the “Financial Center” concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. It can thus also handle time stamps from historical data records from the same time zone, even if the financial centers changed day light saving times at different calendar dates.
3740 Time Series Analysis TimeProjection Time Projections Extract useful time components of a date object, such as day of week, weekend, holiday, day of month, etc, and put it in a data frame. This can be used to create many predictor variables out of a single time variable, which can then be used in a regression or decision tree. Also includes function plotCalendarHeatmap which draws a calendar and overlays a heatmap based on values.
3741 Time Series Analysis timesboot Bootstrap computations for time series objects Computes bootstrap CI for the sample ACF and periodogram
3742 Time Series Analysis timeSeries Rmetrics - Financial Time Series Objects Provides a class and various tools for financial time series. This includes basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.
3743 Time Series Analysis timetk A Tool Kit for Working with Time Series in R Get the time series index, signature, and summary from time series objects and time-based tibbles. Create future time series based on properties of existing time series index. Coerce between time-based tibbles (‘tbl’) and ‘xts’, ‘zoo’, and ‘ts’.
3744 Time Series Analysis timsac Time Series Analysis and Control Package Functions for statistical analysis, prediction and control of time series based mainly on Akaike and Nakagawa (1988) <ISBN 978-90-277-2786-2>.
3745 Time Series Analysis tis Time Indexes and Time Indexed Series Functions and S3 classes for time indexes and time indexed series, which are compatible with FAME frequencies.
3746 Time Series Analysis tpr Temporal Process Regression Regression models for temporal process responses with time-varying coefficient.
3747 Time Series Analysis trend Non-Parametric Trend Tests and Change-Point Detection The analysis of environmental data often requires the detection of trends and change-points. This package includes tests for trend detection (Cox-Stuart Trend Test, Mann-Kendall Trend Test, (correlated) Hirsch-Slack Test, partial Mann-Kendall Trend Test, multivariate (multisite) Mann-Kendall Trend Test, (Seasonal) Sen’s slope, partial Pearson and Spearman correlation trend test), change-point detection (Lanzante’s test procedures, Pettitt’s test, Buishand Range Test, Buishand U Test, Standard Normal Homogeinity Test), detection of non-randomness (Wallis-Moore Phase Frequency Test, Bartels rank von Neumann’s ratio test, Wald-Wolfowitz Test) and the two sample Robust Rank-Order Distributional Test.
3748 Time Series Analysis TSA Time Series Analysis Contains R functions and datasets detailed in the book “Time Series Analysis with Applications in R (second edition)” by Jonathan Cryer and Kung-Sik Chan.
3749 Time Series Analysis tsbox Class-Agnostic Time Series Time series toolkit with identical behavior for all time series classes: ‘ts’,‘xts’, ‘data.frame’, ‘data.table’, ‘tibble’, ‘zoo’, ‘timeSeries’, ‘tsibble’. Also converts reliably between these classes.
3750 Time Series Analysis TSclust Time Series Clustering Utilities A set of measures of dissimilarity between time series to perform time series clustering. Metrics based on raw data, on generating models and on the forecast behavior are implemented. Some additional utilities related to time series clustering are also provided, such as clustering algorithms and cluster evaluation metrics.
3751 Time Series Analysis tscount Analysis of Count Time Series Likelihood-based methods for model fitting and assessment, prediction and intervention analysis of count time series following generalized linear models are provided. Models with the identity and with the logarithmic link function are allowed. The conditional distribution can be Poisson or Negative Binomial.
3752 Time Series Analysis TSdbi Time Series Database Interface Provides a common interface to time series databases. The objective is to define a standard interface so users can retrieve time series data from various sources with a simple, common, set of commands, and so programs can be written to be portable with respect to the data source. The SQL implementations also provide a database table design, so users needing to set up a time series database have a reasonably complete way to do this easily. The interface provides for a variety of options with respect to the representation of time series in R. The interface, and the SQL implementations, also handle vintages of time series data (sometime called editions or real-time data). There is also a (not yet well tested) mechanism to handle multilingual data documentation. Comprehensive examples of all the ’TS*‘packages is provided in the vignette Guide.pdf with the ’TSdata’ package.
3753 Time Series Analysis tsdecomp Decomposition of Time Series Data ARIMA-model-based decomposition of quarterly and monthly time series data. The methodology is developed and described, among others, in Burman (1980) <doi:10.2307/2982132> and Hillmer and Tiao (1982) <doi:10.2307/2287770>.
3754 Time Series Analysis tsdisagg2 Time Series Disaggregation Disaggregates low frequency time series data to higher frequency series. Implements the following methods for temporal disaggregation: Boot, Feibes and Lisman (1967) <doi:10.2307/2985238>, Chow and Lin (1971) <doi:10.2307/1928739>, Fernandez (1981) <doi:10.2307/1924371> and Litterman (1983) <doi:10.2307/1391858>.
3755 Time Series Analysis TSdist Distance Measures for Time Series Data A set of commonly used distance measures and some additional functions which, although initially not designed for this purpose, can be used to measure the dissimilarity between time series. These measures can be used to perform clustering, classification or other data mining tasks which require the definition of a distance measure between time series.
3756 Time Series Analysis tsDyn Nonlinear Time Series Models with Regime Switching Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).
3757 Time Series Analysis TSEntropies Time Series Entropies Computes various entropies of given time series. This is the initial version that includes ApEn() and SampEn() functions for calculating approximate entropy and sample entropy. Approximate entropy was proposed by S.M. Pincus in “Approximate entropy as a measure of system complexity”, Proceedings of the National Academy of Sciences of the United States of America, 88, 2297-2301 (March 1991). Sample entropy was proposed by J. S. Richman and J. R. Moorman in “Physiological time-series analysis using approximate entropy and sample entropy”, American Journal of Physiology, Heart and Circulatory Physiology, 278, 2039-2049 (June 2000). This package also contains FastApEn() and FastSampEn() functions for calculating fast approximate entropy and fast sample entropy. These are newly designed very fast algorithms, resulting from the modification of the original algorithms. The calculated values of these entropies are not the same as the original ones, but the entropy trend of the analyzed time series determines equally reliably. Their main advantage is their speed, which is up to a thousand times higher. A scientific article describing their properties has been submitted to The Journal of Supercomputing and in present time it is waiting for the acceptance.
3758 Time Series Analysis tseries (core) Time Series Analysis and Computational Finance Time series analysis and computational finance.
3759 Time Series Analysis tseriesChaos Analysis of Nonlinear Time Series Routines for the analysis of nonlinear time series. This work is largely inspired by the TISEAN project, by Rainer Hegger, Holger Kantz and Thomas Schreiber: <http://www.mpipks-dresden.mpg.de/~tisean/>.
3760 Time Series Analysis tseriesEntropy Entropy Based Analysis and Tests for Time Series Implements an Entropy measure of dependence based on the Bhattacharya-Hellinger-Matusita distance. Can be used as a (nonlinear) autocorrelation/crosscorrelation function for continuous and categorical time series. The package includes tests for serial dependence and nonlinearity based on it. Some routines have a parallel version that can be used in a multicore/cluster environment. The package makes use of S4 classes.
3761 Time Series Analysis tsfa Time Series Factor Analysis Extraction of Factors from Multivariate Time Series. See ?00tsfa-Intro for more details.
3762 Time Series Analysis tsfeatures Time Series Feature Extraction Methods for extracting various features from time series data. The features provided are those from Hyndman, Wang and Laptev (2013) <doi:10.1109/ICDMW.2015.104>, Kang, Hyndman and Smith-Miles (2017) <doi:10.1016/j.ijforecast.2016.09.004> and from Fulcher, Little and Jones (2013) <doi:10.1098/rsif.2013.0048>. Features include spectral entropy, autocorrelations, measures of the strength of seasonality and trend, and so on. Users can also define their own feature functions.
3763 Time Series Analysis tsfknn Time Series Forecasting Using Nearest Neighbors Allows to forecast time series using nearest neighbors regression Francisco Martinez, Maria P. Frias, Maria D. Perez-Godoy and Antonio J. Rivera (2017) <doi:10.1007/s10462-017-9593-z>. When the forecasting horizon is higher than 1, two multi-step ahead forecasting strategies can be used. The model built is autoregressive, that is, it is only based on the observations of the time series. The nearest neighbors used in a prediction can be consulted and plotted.
3764 Time Series Analysis tsibble (core) Tidy Temporal Data Frames and Tools Provides a ‘tbl_ts’ class (the ‘tsibble’) to store and manage temporal data in a data-centric format, which is built on top of the ‘tibble’. The ‘tsibble’ aims at easily manipulating and analysing temporal data, including counting and filling in time gaps, aggregate over calendar periods, performing rolling window calculations, and etc.
3765 Time Series Analysis tsintermittent Intermittent Time Series Forecasting Functions for analysing and forecasting intermittent demand/slow moving items time series.
3766 Time Series Analysis TSMining Mining Univariate and Multivariate Motifs in Time-Series Data Implementations of a number of functions used to mine numeric time-series data. It covers the implementation of SAX transformation, univariate motif discovery (based on the random projection method), multivariate motif discovery (based on graph clustering), and several functions used for the ease of visualizing the motifs discovered. The details of SAX transformation can be found in J. Lin. E. Keogh, L. Wei, S. Lonardi, Experiencing SAX: A novel symbolic representation of time series, Data Mining and Knowledge Discovery 15 (2) (2007) 107-144. Details on univariate motif discovery method implemented can be found in B. Chiu, E. Keogh, S. Lonardi, Probabilistic discovery of time series motifs, ACM SIGKDD, Washington, DC, USA, 2003, pp. 493-498. Details on the multivariate motif discovery method implemented can be found in A. Vahdatpour, N. Amini, M. Sarrafzadeh, Towards unsupervised activity discovery using multi-dimensional motif detection in time series, IJCAI 2009 21st International Joint Conference on Artificial Intelligence.
3767 Time Series Analysis tsModel Time Series Modeling for Air Pollution and Health Tools for specifying time series regression models
3768 Time Series Analysis tsoutliers Detection of Outliers in Time Series Detection of outliers in time series following the Chen and Liu (1993) <doi:10.2307/2290724> procedure. Innovational outliers, additive outliers, level shifts, temporary changes and seasonal level shifts are considered.
3769 Time Series Analysis tsPI Improved Prediction Intervals for ARIMA Processes and Structural Time Series Prediction intervals for ARIMA and structural time series models using importance sampling approach with uninformative priors for model parameters, leading to more accurate coverage probabilities in frequentist sense. Instead of sampling the future observations and hidden states of the state space representation of the model, only model parameters are sampled, and the method is based solving the equations corresponding to the conditional coverage probability of the prediction intervals. This makes method relatively fast compared to for example MCMC methods, and standard errors of prediction limits can also be computed straightforwardly.
3770 Time Series Analysis TSrepr Time Series Representations Methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also min-max and z-score normalisations, and forecasting accuracy measures are implemented.
3771 Time Series Analysis TSstudio Functions for Time Series Analysis and Forecasting Provides a set of tools for descriptive and predictive analysis of time series data. That includes functions for interactive visualization of time series objects and as well utility functions for automation time series forecasting.
3772 Time Series Analysis TSTutorial Fitting and Predict Time Series Interactive Laboratory Interactive laboratory of Time Series based in Box-Jenkins methodology.
3773 Time Series Analysis tswge Applied Time Series Analysis Accompanies the text Applied Time Series Analysis with R, 2nd edition by Woodward, Gray, and Elliott. It is helpful for data analysis and for time series instruction.
3774 Time Series Analysis urca Unit Root and Cointegration Tests for Time Series Data Unit root and cointegration tests encountered in applied econometric analysis are implemented.
3775 Time Series Analysis uroot Unit Root Tests for Seasonal Time Series Seasonal unit roots and seasonal stability tests. P-values based on response surface regressions are available for both tests. P-values based on bootstrap are available for seasonal unit root tests. A parallel implementation of the bootstrap method requires a CUDA capable GPU with compute capability >= 3.0, otherwise a debugging version fully coded in R is used.
3776 Time Series Analysis VAR.etp VAR modelling: estimation, testing, and prediction Estimation, Hypothesis Testing, Prediction for Stationary Vector Autoregressive Models
3777 Time Series Analysis vars VAR Modelling Estimation, lag selection, diagnostic testing, forecasting, causality analysis, forecast error variance decomposition and impulse response functions of VAR models and estimation of SVAR and SVEC models.
3778 Time Series Analysis VARsignR Sign Restrictions, Bayesian, Vector Autoregression Models Provides routines for identifying structural shocks in vector autoregressions (VARs) using sign restrictions.
3779 Time Series Analysis Wats Wrap Around Time Series Graphics Wrap-around Time Series (WATS) plots for interrupted time series designs with seasonal patterns.
3780 Time Series Analysis WaveletComp Computational Wavelet Analysis Wavelet analysis and reconstruction of time series, cross-wavelets and phase-difference (with filtering options), significance with simulation algorithms.
3781 Time Series Analysis wavelets Functions for Computing Wavelet Filters, Wavelet Transforms and Multiresolution Analyses Contains functions for computing and plotting discrete wavelet transforms (DWT) and maximal overlap discrete wavelet transforms (MODWT), as well as their inverses. Additionally, it contains functionality for computing and plotting wavelet transform filters that are used in the above decompositions as well as multiresolution analyses.
3782 Time Series Analysis waveslim Basic Wavelet Routines for One-, Two- And Three-Dimensional Signal Processing Basic wavelet routines for time series (1D), image (2D) and array (3D) analysis. The code provided here is based on wavelet methodology developed in Percival and Walden (2000); Gencay, Selcuk and Whitcher (2001); the dual-tree complex wavelet transform (DTCWT) from Kingsbury (1999, 2001) as implemented by Selesnick; and Hilbert wavelet pairs (Selesnick 2001, 2002). All figures in chapters 4-7 of GSW (2001) are reproducible using this package and R code available at the book website(s) below.
3783 Time Series Analysis wavethresh Wavelets Statistics and Transforms Performs 1, 2 and 3D real and complex-valued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complex-valued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.
3784 Time Series Analysis wavScalogram Wavelet Scalogram Tools for Time Series Analysis Provides scalogram based wavelet tools for time series analysis: wavelet power spectrum, scalogram, windowed scalogram, windowed scalogram difference (see Bolos et al. (2017) <doi:10.1016/j.amc.2017.05.046>), scale index and windowed scale index (Benitez et al. (2010) <doi:10.1016/j.camwa.2010.05.010>).
3785 Time Series Analysis WeightedPortTest Weighted Portmanteau Tests for Time Series Goodness-of-fit This packages contains the Weighted Portmanteau Tests as described in “New Weighted Portmanteau Statistics for Time Series Goodness-of-Fit Testing’ accepted for publication by the Journal of the American Statistical Association.
3786 Time Series Analysis wktmo Converting Weekly Data to Monthly Data Converts weekly data to monthly data. Users can use three types of week formats: ISO week, epidemiology week (epi week) and calendar date.
3787 Time Series Analysis wmtsa Wavelet Methods for Time Series Analysis Software to book Wavelet Methods for Time Series Analysis, Donald B. Percival and Andrew T. Walden, Cambridge University Press, 2000.
3788 Time Series Analysis x12 Interface to ‘X12-ARIMA’/‘X13-ARIMA-SEATS’ and Structure for Batch Processing of Seasonal Adjustment The ‘X13-ARIMA-SEATS’ <https://www.census.gov/srd/www/x13as/> methodology and software is a widely used software and developed by the US Census Bureau. It can be accessed from ‘R’ with this package and ‘X13-ARIMA-SEATS’ binaries are provided by the ‘R’ package ‘x13binary’.
3789 Time Series Analysis x12GUI X12 - Graphical User Interface A graphical user interface for the x12 package
3790 Time Series Analysis x13binary Provide the ‘x13ashtml’ Seasonal Adjustment Binary The US Census Bureau provides a seasonal adjustment program now called ‘X-13ARIMA-SEATS’ building on both earlier programs called X-11 and X-12 as well as the SEATS program by the Bank of Spain. The US Census Bureau offers both source and binary versions which this package integrates for use by other R packages.
3791 Time Series Analysis xts eXtensible Time Series Provide for uniform handling of R’s different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
3792 Time Series Analysis yuima The YUIMA Project Package for SDEs Simulation and Inference for SDEs and Other Stochastic Processes.
3793 Time Series Analysis ZIM Zero-Inflated Models (ZIM) for Count Time Series with Excess Zeros Analyze count time series with excess zeros. Two types of statistical models are supported: Markov regression by Yang et al. (2013) <doi:10.1016/j.stamet.2013.02.001> and state-space models by Yang et al. (2015) <doi:10.1177/1471082X14535530>. They are also known as observation-driven and parameter-driven models respectively in the time series literature. The functions used for Markov regression or observation-driven models can also be used to fit ordinary regression models with independent data under the zero-inflated Poisson (ZIP) or zero-inflated negative binomial (ZINB) assumption. Besides, the package contains some miscellaneous functions to compute density, distribution, quantile, and generate random numbers from ZIP and ZINB distributions.
3794 Time Series Analysis zoo (core) S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations) An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo’s key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.
3795 Time Series Analysis ZRA Dynamic Plots for Time Series Forecasting Combines a forecast of a time series, using the function forecast(), with the dynamic plots from dygraphs.
3796 Web Technologies and Services abbyyR Access to Abbyy Optical Character Recognition (OCR) API Get text from images of text using Abbyy Cloud Optical Character Recognition (OCR) API. Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports. Get the results in a variety of formats including plain text and XML. To learn more about the Abbyy OCR API, see <http://ocrsdk.com/>.
3797 Web Technologies and Services ajv Another JSON Schema Validator A thin wrapper around the ‘ajv’ JSON validation package for JavaScript. See <http://epoberezkin.github.io/ajv/> for details.
3798 Web Technologies and Services analogsea Interface to ‘Digital Ocean’ Provides a set of functions for interacting with the ‘Digital Ocean’ API at <https://developers.digitalocean.com/documentation/v2>, including creating images, destroying them, rebooting, getting details on regions, and available images.
3799 Web Technologies and Services aRxiv Interface to the arXiv API An interface to the API for ‘arXiv’ (<https://arxiv.org>), a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
3800 Web Technologies and Services aws.polly Client for AWS Polly A client for AWS Polly <http://aws.amazon.com/documentation/polly>, a speech synthesis service.
3801 Web Technologies and Services aws.s3 ‘AWS S3’ Client Package A simple client package for the Amazon Web Services (‘AWS’) Simple Storage Service (‘S3’) ‘REST’ ‘API’ <https://aws.amazon.com/s3/>.
3802 Web Technologies and Services aws.signature Amazon Web Services Request Signatures Generates version 2 and version 4 request signatures for Amazon Web Services (‘AWS’) <https://aws.amazon.com/> Application Programming Interfaces (‘APIs’) and provides a mechanism for retrieving credentials from environment variables, ‘AWS’ credentials files, and ‘EC2’ instance metadata. For use on ‘EC2’ instances, users will need to install the suggested package ‘aws.ec2metadata’ <https://cran.r-project.org/package=aws.ec2metadata>.
3803 Web Technologies and Services aws.sns AWS SNS Client Package A simple client package for the Amazon Web Services (‘AWS’) Simple Notification Service (‘SNS’) ‘API’ <https://aws.amazon.com/sns/>.
3804 Web Technologies and Services AzureML Interface with Azure Machine Learning Datasets, Experiments and Web Services Functions and datasets to support Azure Machine Learning. This allows you to interact with datasets, as well as publish and consume R functions as API services.
3805 Web Technologies and Services banR R Client for the BAN API A client for the “Base Adresses Nationale” (BAN) API, which allows to (batch) geocode and reverse-geocode French addresses. For more information about the BAN and its API, please see <https://adresse.data.gouv.fr/api>.
3806 Web Technologies and Services bigml Bindings for the BigML API The ‘bigml’ package contains bindings for the BigML API. The package includes methods that provide straightforward access to basic API functionality, as well as methods that accommodate idiomatic R data types and concepts.
3807 Web Technologies and Services bigrquery An Interface to Google’s ‘BigQuery’ ‘API’ Easily talk to Google’s ‘BigQuery’ database from R.
3808 Web Technologies and Services boilerpipeR Interface to the Boilerpipe Java Library Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe (http://code.google.com/p/boilerpipe/) Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.
3809 Web Technologies and Services boxr Interface for the ‘Box.com API’ An R interface for the remote file hosting service ‘Box’ (<https://www.box.com/>). In addition to uploading and downloading files, this package includes functions which mirror base R operations for local files, (e.g. box_load(), box_save(), box_read(), box_setwd(), etc.), as well as ‘git’ style functions for entire directories (e.g. box_fetch(), box_push()).
3810 Web Technologies and Services brandwatchR ‘Brandwatch’ API to R Interact with the ‘Brandwatch’ API <https://developers.brandwatch.com/docs>. Allows you to authenticate to the API and obtain data for projects, queries, query groups tags and categories. Also allows you to directly obtain mentions and aggregate data for a specified query or query group.
3811 Web Technologies and Services captr Client for the Captricity API Get text from images of text using Captricity Optical Character Recognition (OCR) API. Captricity allows you to get text from handwritten forms ― think surveys ― and other structured paper documents. And it can output data in form a delimited file keeping field information intact. For more information, read <https://shreddr.captricity.com/developer/overview/>.
3812 Web Technologies and Services clarifai Access to Clarifai API Get description of images from Clarifai API. For more information, see <http://clarifai.com>. Clarifai uses a large deep learning cloud to come up with descriptive labels of the things in an image. It also provides how confident it is about each of the labels.
3813 Web Technologies and Services crminer Fetch ‘Scholary’ Full Text from ‘Crossref’ Text mining client for ‘Crossref’ (<https://crossref.org>). Includes functions for getting getting links to full text of articles, fetching full text articles from those links or Digital Object Identifiers (‘DOIs’), and text extraction from ‘PDFs’.
3814 Web Technologies and Services crplyr A ‘dplyr’ Interface for Crunch In order to facilitate analysis of datasets hosted on the Crunch data platform <http://crunch.io/>, the ‘crplyr’ package implements ‘dplyr’ methods on top of the Crunch backend. The usual methods ‘select’, ‘filter’, ‘group_by’, ‘summarize’, and ‘collect’ are implemented in such a way as to perform as much computation on the server and pull as little data locally as possible.
3815 Web Technologies and Services crul (core) HTTP Client A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Ruby’s ‘faraday’ gem (<https://rubygems.org/gems/faraday>). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package ‘curl’, an interface to ‘libcurl’ (<https://curl.haxx.se/libcurl>).
3816 Web Technologies and Services crunch Crunch.io Data Tools The Crunch.io service <http://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
3817 Web Technologies and Services crunchy Shiny Apps on Crunch To facilitate building custom dashboards on the Crunch data platform <https://crunch.io/>, the ‘crunchy’ package provides tools for working with ‘shiny’. These tools include utilities to manage authentication and authorization automatically and custom stylesheets to help match the look and feel of the Crunch web application. The package also includes several gadgets for use in ‘RStudio’.
3818 Web Technologies and Services curl (core) A Modern and Flexible Web Client for R The curl() and curl_download() functions provide highly configurable drop-in replacements for base url() and download.file() with better performance, support for encryption (https, ftps), gzip compression, authentication, and other ‘libcurl’ goodies. The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of ‘libcurl’ is recommended; for a more-user-friendly web client see the ‘httr’ package which builds on this package with http specific tools and logic.
3819 Web Technologies and Services cymruservices Query ‘Team Cymru’ ‘IP’ Address, Autonomous System Number (‘ASN’), Border Gateway Protocol (‘BGP’), Bogon and ‘Malware’ Hash Data Services A toolkit for querying ‘Team Cymru’ <http://team-cymru.org> ‘IP’ address, Autonomous System Number (‘ASN’), Border Gateway Protocol (‘BGP’), Bogon and ‘Malware’ Hash Data Services.
3820 Web Technologies and Services d3Network Tools for creating D3 JavaScript network, tree, dendrogram, and Sankey graphs from R This packages is intended to make it easy to create D3 JavaScript network, tree, dendrogram, and Sankey graphs from R using data frames. !!! NOTE: Active development has moved to the networkD3 package. !!!
3821 Web Technologies and Services datamart Unified access to your data sources Provides an S4 infrastructure for unified handling of internal datasets and web based data sources. The package is currently in beta; things may break, change or go away without warning.
3822 Web Technologies and Services dataone R Interface to the DataONE REST API Provides read and write access to data and metadata from the DataONE network <https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.
3823 Web Technologies and Services datarobot ‘DataRobot’ Predictive Modeling API For working with the ‘DataRobot’ predictive modeling platform’s API <https://www.datarobot.com/>.
3824 Web Technologies and Services dataverse Client for Dataverse 4 Repositories Provides access to Dataverse version 4 APIs <https://dataverse.org/>, enabling data search, retrieval, and deposit. For Dataverse versions <= 4.0, use the deprecated ‘dvn’ package <https://cran.r-project.org/package=dvn>.
3825 Web Technologies and Services discgolf Discourse API Client Client for the Discourse API. Discourse is a open source discussion forum platform (<https://www.discourse.org/>). It comes with ‘RESTful’ API access to an installation. This client requires that you are authorized to access a Discourse installation, either yours or another.
3826 Web Technologies and Services docuSignr Connect to ‘DocuSign’ API Connect to the ‘DocuSign’ Rest API <https://www.docusign.com/p/RESTAPIGuide/RESTAPIGuide.htm>, which supports embedded signing, and sending of documents.
3827 Web Technologies and Services downloader Download Files over HTTP and HTTPS Provides a wrapper for the download.file function, making it possible to download files over HTTPS on Windows, Mac OS X, and other Unix-like platforms. The ‘RCurl’ package provides this functionality (and much more) but can be difficult to install because it must be compiled with external dependencies. This package has no external dependencies, so it is much easier to install.
3828 Web Technologies and Services duckduckr Simple Client for the DuckDuckGo Instant Answer API Programmatic access to the DuckDuckGo Instant Answer API <https://api.duckduckgo.com/api>.
3829 Web Technologies and Services europepmc R Interface to the Europe PubMed Central RESTful Web Service An R Client for the Europe PubMed Central RESTful Web Service (see <https://europepmc.org/RestfulWebService> for more information). It gives access to both metadata on life science literature and open access full texts. Europe PMC indexes all PubMed content and other literature sources including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents. In addition to bibliographic metadata, the client allows users to fetch citations and reference lists. Links between life-science literature and other EBI databases, including ENA, PDB or ChEMBL are also accessible. No registration or API key is required. See the vignettes for usage examples.
3830 Web Technologies and Services facebook.S4 Access to Facebook API V2 via a Set of S4 Classes Provides an interface to the Facebook API and builds collections of elements that reflects the graph architecture of Facebook. See <https://developers.facebook.com/docs/graph-api> for more information.
3831 Web Technologies and Services factualR thin wrapper for the Factual.com server API Per the Factual.com website, “Factual is a platform where anyone can share and mash open, living data on any subject.” The data is in the form of tables and is accessible via REST API. The factualR package is a thin wrapper around the Factual.com API, to make it even easier for people working with R to explore Factual.com data sets.
3832 Web Technologies and Services FastRWeb Fast Interactive Framework for Web Scripting Using R Infrastrcture for creating rich, dynamic web content using R scripts while maintaining very fast response time.
3833 Web Technologies and Services fauxpas HTTP Error Helpers HTTP error helpers. Methods included for general purpose HTTP error handling, as well as individual methods for every HTTP status code, both via status code numbers as well as their descriptive names. Supports ability to adjust behavior to stop, message or warning. Includes ability to use custom whisker template to have any configuration of status code, short description, and verbose message. Currently supports integration with ‘crul’, ‘curl’, and ‘httr’.
3834 Web Technologies and Services fbRads Analyzing and Managing Facebook Ads from R Wrapper functions around the Facebook Marketing ‘API’ to create, read, update and delete custom audiences, images, campaigns, ad sets, ads and related content.
3835 Web Technologies and Services feedeR Read RSS/Atom Feeds from R Retrieve data from RSS/Atom feeds.
3836 Web Technologies and Services fiery A Lightweight and Flexible Web Framework A very flexible framework for building server side logic in R. The framework is unopinionated when it comes to how HTTP requests and WebSocket messages are handled and supports all levels of app complexity; from serving static content to full-blown dynamic web-apps. Fiery does not hold your hand as much as e.g. the shiny package does, but instead sets you free to create your web app the way you want.
3837 Web Technologies and Services fitbitScraper Scrapes Data from Fitbit Scrapes data from Fitbit <http://www.fitbit.com>. This does not use the official API, but instead uses the API that the web dashboard uses to generate the graphs displayed on the dashboard after login at <http://www.fitbit.com>.
3838 Web Technologies and Services fulltext Full Text of ‘Scholarly’ Articles Across Many Data Sources Provides a single interface to many sources of full text ‘scholarly’ data, including ‘Biomed Central’, Public Library of Science, ‘Pubmed Central’, ‘eLife’, ‘F1000Research’, ‘PeerJ’, ‘Pensoft’, ‘Hindawi’, ‘arXiv’ ‘preprints’, and more. Functionality included for searching for articles, downloading full or partial text, downloading supplementary materials, converting to various data formats.
3839 Web Technologies and Services ganalytics Interact with ‘Google Analytics’ Functions for querying the ‘Google Analytics’ core reporting, real-time, multi-channel funnel and management APIs, as well as the ‘Google Tag Manager’ (GTM) API. Write methods are also provided for the management and GTM APIs so that you can change tag, property or view settings, for example. Define reporting queries using natural R expressions instead of being concerned as much about API technical intricacies like query syntax, character code escaping, and API limitations.
3840 Web Technologies and Services GAR Authorize and Request Google Analytics Data The functions included are used to obtain initial authentication with Google Analytics as well as simple and organized data retrieval from the API. Allows for retrieval from multiple profiles at once.
3841 Web Technologies and Services gdns Tools to Work with Google’s ‘DNS-over-HTTPS’ (‘DoH’) ‘API’ To address the problem of insecurity of ‘UDP’-based ‘DNS’ requests, ‘Google Public DNS’ offers ‘DNS’ resolution over an encrypted ‘HTTPS’ connection. ‘DNS-over-HTTPS’ greatly enhances privacy and security between a client and a recursive resolver, and complements ‘DNSSEC’ to provide end-to-end authenticated ‘DNS’ lookups. Functions that enable querying individual requests that bulk requests that return detailed responses and bulk requests are both provided. Support for reverse lookups is also provided. See <https://developers.google.com/speed/public-dns/docs/dns-over-https> for more information.
3842 Web Technologies and Services genderizeR Gender Prediction Based on First Names Utilizes the ‘genderize.io’ Application Programming Interface to predict gender from first names extracted from a text vector. The accuracy of prediction could be controlled by two parameters: counts of a first name in the database and probability of prediction.
3843 Web Technologies and Services geonapi ‘GeoNetwork’ API R Interface Provides an R interface to the ‘GeoNetwork’ API (<https://geonetwork-opensource.org/#api>) allowing to upload and publish metadata in a ‘GeoNetwork’ web-application and exposte it to OGC CSW.
3844 Web Technologies and Services geoparser Interface to the Geoparser.io API for Identifying and Disambiguating Places Mentioned in Text A wrapper for the Geoparser.io API version 0.4.0 (see <https://geoparser.io/>), which is a web service that identifies places mentioned in text, disambiguates those places, and returns detailed data about the places found in the text. Basic, limited API access is free with paid plans to accommodate larger workloads.
3845 Web Technologies and Services geosapi GeoServer REST API R Interface Provides an R interface to the GeoServer REST API, allowing to upload and publish data in a GeoServer web-application and expose data to OGC Web-Services. The package currently supports all CRUD (Create,Read,Update,Delete) operations on GeoServer workspaces, namespaces, datastores (stores of vector data), featuretypes, layers, styles, as well as vector data upload operations. For more information about the GeoServer REST API, see <http://docs.geoserver.org/stable/en/user/rest/>.
3846 Web Technologies and Services ggmap Spatial Visualization with ggplot2 A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.
3847 Web Technologies and Services ggvis Interactive Grammar of Graphics An implementation of an interactive grammar of graphics, taking the best parts of ‘ggplot2’, combining them with the reactive framework of ‘shiny’ and drawing web graphics using ‘vega’.
3848 Web Technologies and Services gh ‘GitHub’ ‘API’ Minimal client to access the ‘GitHub’ ‘API’.
3849 Web Technologies and Services giphyr R Interface to the Giphy API An interface to the ‘API’ of ‘Giphy’, a popular index-based search engine for ‘GIFs’ and animated stickers (see <http://giphy.com/faq> and <https://github.com/Giphy/GiphyAPI> for more information about ‘Giphy’ and its ‘API’) . This package also provides a ‘RStudio Addin’, which can help users easily search and download ‘GIFs’ and insert them to a ‘rmarkdown’ presentation.
3850 Web Technologies and Services gistr Work with ‘GitHub’ ‘Gists’ Work with ‘GitHub’ ‘gists’ from ‘R’ (e.g., <http://en.wikipedia.org/wiki/GitHub#Gist>, <https://help.github.com/articles/about-gists/>). A ‘gist’ is simply one or more files with code/text/images/etc. This package allows the user to create new ‘gists’, update ‘gists’ with new files, rename files, delete files, get and delete ‘gists’, star and ‘un-star’ ‘gists’, fork ‘gists’, open a ‘gist’ in your default browser, get embed code for a ‘gist’, list ‘gist’ ‘commits’, and get rate limit information when ‘authenticated’. Some requests require authentication and some do not. ‘Gists’ website: <https://gist.github.com/>.
3851 Web Technologies and Services git2r Provides Access to Git Repositories Interface to the ‘libgit2’ library, which is a pure C implementation of the ‘Git’ core methods. Provides access to ‘Git’ repositories to extract data and running some basic ‘Git’ commands.
3852 Web Technologies and Services gitlabr Access to the Gitlab API Provides R functions to access the API of the project and repository management web application gitlab. For many common tasks (repository file access, issue assignment and status, commenting) convenience wrappers are provided, and in addition the full API can be used by specifying request locations. Gitlab is open-source software and can be self-hosted or used on gitlab.com.
3853 Web Technologies and Services gmailr Access the Gmail RESTful API An interface to the Gmail RESTful API. Allows access to your Gmail messages, threads, drafts and labels.
3854 Web Technologies and Services googleAnalyticsR Google Analytics API into R Interact with the Google Analytics APIs <https://developers.google.com/analytics/>, including the Core Reporting API (v3 and v4), Management API, and Multi-Channel Funnel API.
3855 Web Technologies and Services googleAuthR Authenticate and Create Google APIs Create R functions that interact with OAuth2 Google APIs <https://developers.google.com/apis-explorer/> easily, with auto-refresh and Shiny compatibility.
3856 Web Technologies and Services googleCloudStorageR Interface with Google Cloud Storage API Interact with Google Cloud Storage <https://cloud.google.com/storage/> API in R. Part of the ‘cloudyr’ <https://cloudyr.github.io/> project.
3857 Web Technologies and Services googleComputeEngineR R Interface with Google Compute Engine Interact with the ‘Google Compute Engine’ API in R. Lets you create, start and stop instances in the ‘Google Cloud’. Support for preconfigured instances, with templates for common R needs.
3858 Web Technologies and Services googleLanguageR Call Google’s ‘Natural Language’ API, ‘Cloud Translation’ API, ‘Cloud Speech’ API and ‘Cloud Text-to-Speech’ API Call ‘Google Cloud’ machine learning APIs for text and speech tasks. Call the ‘Cloud Translation’ API <https://cloud.google.com/translate/> for detection and translation of text, the ‘Natural Language’ API <https://cloud.google.com/natural-language/> to analyse text for sentiment, entities or syntax, the ‘Cloud Speech’ API <https://cloud.google.com/speech/> to transcribe sound files to text and the ‘Cloud Text-to-Speech’ API <https://cloud.google.com/text-to-speech/> to turn text into sound files.
3859 Web Technologies and Services googlesheets Manage Google Spreadsheets from R Interact with Google Sheets from R.
3860 Web Technologies and Services googleVis R Interface to Google Charts R interface to Google’s chart tools, allowing users to create interactive charts based on data frames. Charts are displayed locally via the R HTTP help server. A modern browser with an Internet connection is required and for some charts a Flash player. The data remains local and is not uploaded to Google.
3861 Web Technologies and Services graphTweets Visualise Twitter Interactions Allows building an edge table from data frame of tweets, also provides function to build nodes and another create a temporal graph.
3862 Web Technologies and Services gsheet Download Google Sheets Using Just the URL Simple package to download Google Sheets using just the sharing link. Spreadsheets can be downloaded as a data frame, or as plain text to parse manually. Google Sheets is the new name for Google Docs Spreadsheets.
3863 Web Technologies and Services gtrendsR Perform and Display Google Trends Queries An interface for retrieving and displaying the information returned online by Google Trends is provided. Trends (number of hits) over the time as well as geographic representation of the results can be displayed.
3864 Web Technologies and Services htm2txt Convert Html into Text Convert a html document to simple plain texts by removing all html tags. This package utilizes regular expressions to strip off html tags. It also offers gettxt() and browse() function, which enables you to get or browse texts at a certain web page.
3865 Web Technologies and Services htmltab Assemble Data Frames from HTML Tables HTML tables are a valuable data source but extracting and recasting these data into a useful format can be tedious. This package allows to collect structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides three major advantages. First, the function automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table, including semantic header information that appear throughout the body. Third, the function preprocesses table code, corrects common types of malformations, removes unneeded parts and so helps to alleviate the need for tedious post-processing.
3866 Web Technologies and Services htmltidy Tidy Up and Test XPath Queries on HTML and XML Content HTML documents can be beautiful and pristine. They can also be wretched, evil, malformed demon-spawn. Now, you can tidy up that HTML and XHTML before processing it with your favorite angle-bracket crunching tools, going beyond the limited tidying that ‘libxml2’ affords in the ‘XML’ and ‘xml2’ packages and taming even the ugliest HTML code generated by the likes of Google Docs and Microsoft Word. It’s also possible to use the functions provided to format or “pretty print” HTML content as it is being tidied. Utilities are also included that make it possible to view formatted and “pretty printed” HTML/XML content from HTML/XML document objects, nodes, node sets and plain character HTML/XML using ‘vkbeautify’ (by Vadim Kiryukhin) and ‘highlight.js’ (by Ivan Sagalaev). Also (optionally) enables filtering of nodes via XPath or viewing an HTML/XML document in “tree” view using ‘xml-viewer’ (by Julian Gruber). See <https://github.com/vkiryukhin/vkBeautify> and <https://github.com/juliangruber/xml-viewer> for more information about ‘vkbeautify’ and ‘xml-viewer’, respectively.
3867 Web Technologies and Services htmltools Tools for HTML Tools for HTML generation and output.
3868 Web Technologies and Services httpcache Query Cache for HTTP Clients In order to improve performance for HTTP API clients, ‘httpcache’ provides simple tools for caching and invalidating cache. It includes the HTTP verb functions GET, PUT, PATCH, POST, and DELETE, which are drop-in replacements for those in the ‘httr’ package. These functions are cache-aware and provide default settings for cache invalidation suitable for RESTful APIs; the package also enables custom cache-management strategies. Finally, ‘httpcache’ includes a basic logging framework to facilitate the measurement of HTTP request time and cache performance.
3869 Web Technologies and Services httpcode ‘HTTP’ Status Code Helper Find and explain the meaning of ‘HTTP’ status codes. Functions included for searching for codes by full or partial number, by message, and get appropriate dog and cat images for many status codes.
3870 Web Technologies and Services httping ‘Ping’ ‘URLs’ to Time ‘Requests’ A suite of functions to ping ‘URLs’ and to time ‘HTTP’ ‘requests’. Designed to work with ‘httr’.
3871 Web Technologies and Services httpRequest Basic HTTP Request HTTP Request protocols. Implements the GET, POST and multipart POST request.
3872 Web Technologies and Services httptest A Test Environment for HTTP Requests Testing and documenting code that communicates with remote servers can be painful. Dealing with authentication, server state, and other complications can make testing seem too costly to bother with. But it doesn’t need to be that hard. This package enables one to test all of the logic on the R sides of the API in your package without requiring access to the remote service. Importantly, it provides three contexts that mock the network connection in different ways, as well as testing functions to assert that HTTP requests were―or were not―made. It also allows one to safely record real API responses to use as test fixtures. The ability to save responses and load them offline also enables one to write vignettes and other dynamic documents that can be distributed without access to a live server.
3873 Web Technologies and Services httpuv HTTP and WebSocket Server Library Provides low-level socket and protocol support for handling HTTP and WebSocket requests directly from within R. It is primarily intended as a building block for other packages, rather than making it particularly easy to create complete web applications using httpuv alone. httpuv is built on top of the libuv and http-parser C libraries, both of which were developed by Joyent, Inc. (See LICENSE file for libuv and http-parser license information.)
3874 Web Technologies and Services httr (core) Tools for Working with URLs and HTTP Useful tools for working with HTTP organised by HTTP verbs (GET(), POST(), etc). Configuration functions make it easy to control additional request components (authenticate(), add_headers() and so on).
3875 Web Technologies and Services imguR An Imgur.com API Client Package A complete API client for the image hosting service Imgur.com, including the an imgur graphics device, enabling the easy upload and sharing of plots.
3876 Web Technologies and Services instaR Access to Instagram API via R Provides an interface to the Instagram API <https://instagram.com/ developer/>, which allows R users to download public pictures filtered by hashtag, popularity, user or location, and to access public users’ profile data.
3877 Web Technologies and Services internetarchive An API Client for the Internet Archive Search the Internet Archive, retrieve metadata, and download files.
3878 Web Technologies and Services iptools Manipulate, Validate and Resolve ‘IP’ Addresses A toolkit for manipulating, validating and testing ‘IP’ addresses and ranges, along with datasets relating to ‘IP’ addresses. Tools are also provided to map ‘IPv4’ blocks to country codes. While it primarily has support for the ‘IPv4’ address space, more extensive ‘IPv6’ support is intended.
3879 Web Technologies and Services jqr Client for ‘jq’, a ‘JSON’ Processor Client for ‘jq’, a ‘JSON’ processor (<https://stedolan.github.io/jq/>), written in C. ‘jq’ allows the following with ‘JSON’ data: index into, parse, do calculations, cut up and filter, change key names and values, perform conditionals and comparisons, and more.
3880 Web Technologies and Services js Tools for Working with JavaScript in R A set of utilities for working with JavaScript syntax in R. Includes tools to parse, tokenize, compile, validate, reformat, optimize and analyze JavaScript code.
3881 Web Technologies and Services jSonarR jSonar Analytics Platform API for R This package enables users to access MongoDB by running queries and returning their results in R data frames. Usually, data in MongoDB is only available in the form of a JSON document. jSonarR uses data processing and conversion capabilities in the jSonar Analytics Platform and the JSON Studio Gateway (http://www.jsonstudio.com), to convert it to a tabular format which is easy to use with existing R packages.
3882 Web Technologies and Services jsonlite (core) A Robust, High Performance JSON Parser and Generator for R A fast JSON parser and generator optimized for statistical data and the web. Started out as a fork of ‘RJSONIO’, but has been completely rewritten in recent versions. The package offers flexible, robust, high performance tools for working with JSON in R and is particularly powerful for building pipelines and interacting with a web API. The implementation is based on the mapping described in the vignette (Ooms, 2014). In addition to converting JSON data from/to R objects, ‘jsonlite’ contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.
3883 Web Technologies and Services jsonvalidate Validate ‘JSON’ Uses the node library ‘is-my-json-valid’ to validate ‘JSON’ against a ‘JSON’ schema.
3884 Web Technologies and Services jstor Read Data from JSTOR/DfR Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.
3885 Web Technologies and Services jug A Simple Web Framework for R jug is a web framework aimed at easily building APIs. It is mostly aimed at exposing R functions, models and visualizations to third-parties by way of http requests.
3886 Web Technologies and Services languagelayeR Access the ‘languagelayer’ API Improve your text analysis with languagelayer <https://languagelayer.com>, a powerful language detection API.
3887 Web Technologies and Services leafletR Interactive Web-Maps Based on the Leaflet JavaScript Library Display your spatial data on interactive web-maps using the open-source JavaScript library Leaflet. ‘leafletR’ provides basic web-mapping functionality to combine vector data and online map tiles from different sources. See <http://leafletjs.com> for more information on Leaflet.
3888 Web Technologies and Services LendingClub A Lending Club API Wrapper Functions to access Lending Club’s API and assist the investor manage their account. Lending Club is a peer-to-peer lending service where loans are broken up into $25 notes that investors buy with the expectation of earning a return on the interest. You can learn more about the API here: <http://www.lendingclub.com/developers/lc-api.action>.
3889 Web Technologies and Services livechatR R Wrapper for LiveChat REST API Provides a wrapper around LiveChat’s API. The R functions allow for one to extract chat sessions, raw text of chats between agents and customers and events.
3890 Web Technologies and Services longurl Expand Short URLs Tools expand vectors of short URLs into long URLs. No API services are used, which may mean that this operates more slowly than API services do (since they usually cache results of expansions every user of the service performs).
3891 Web Technologies and Services lucr Currency Formatting and Conversion Reformat currency-based data as numeric values (or numeric values as currency-based data) and convert between currencies.
3892 Web Technologies and Services magrittr A Forward-Pipe Operator for R Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, “Ceci n’est pas un pipe.”
3893 Web Technologies and Services mailR A Utility to Send Emails from R Interface to Apache Commons Email to send emails from R.
3894 Web Technologies and Services mapsapi ‘sf’-Compatible Interface to ‘Google Maps’ APIs Interface to the ‘Google Maps’ APIs: (1) routing directions based on the ‘Directions’ API, returned as ‘sf’ objects, either as single feature per alternative route, or a single feature per segment per alternative route; (2) travel distance or time matrices based on the ‘Distance Matrix’ API; (3) geocoded locations based on the ‘Geocode’ API, returned as ‘sf’ objects, either points or bounds.
3895 Web Technologies and Services mathpix Support for the ‘Mathpix’ API (Image to ‘LaTeX’) Given an image of a formula (typeset or handwritten) this package provides calls to the ‘Mathpix’ service to produce the ‘LaTeX’ code which should generate that image, and pastes it into a (e.g. an ‘rmarkdown’) document. See <https://docs.mathpix.com/> for full details. ‘Mathpix’ is an external service and use of the API is subject to their terms and conditions.
3896 Web Technologies and Services mime Map Filenames to MIME Types Guesses the MIME type from a filename extension using the data derived from /etc/mime.types in UNIX-type systems.
3897 Web Technologies and Services mscstexta4r R Client for the Microsoft Cognitive Services Text Analytics REST API R Client for the Microsoft Cognitive Services Text Analytics REST API, including Sentiment Analysis, Topic Detection, Language Detection, and Key Phrase Extraction. An account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
3898 Web Technologies and Services mscsweblm4r R Client for the Microsoft Cognitive Services Web Language Model REST API R Client for the Microsoft Cognitive Services Web Language Model REST API, including Break Into Words, Calculate Conditional Probability, Calculate Joint Probability, Generate Next Words, and List Available Models. A valid account MUST be registered at the Microsoft Cognitive Services website <https://www.microsoft.com/cognitive-services/> in order to obtain a (free) API key. Without an API key, this package will not work properly.
3899 Web Technologies and Services ndjson Wicked-Fast Streaming ‘JSON’ (‘ndjson’) Reader Streaming ‘JSON’ (‘ndjson’) has one ‘JSON’ record per-line and many modern ‘ndjson’ files contain large numbers of records. These constructs may not be columnar in nature, but it is often useful to read in these files and “flatten” the structure out to enable working with the data in an R ‘data.frame’-like context. Functions are provided that make it possible to read in plain ‘ndjson’ files or compressed (‘gz’) ‘ndjson’ files and either validate the format of the records or create “flat” ‘data.table’ structures from them.
3900 Web Technologies and Services notifyme Send Alerts to your Cellphone and Phillips Hue Lights Functions to flash your hue lights, or text yourself, from R. Designed to be used with long running scripts.
3901 Web Technologies and Services oai General Purpose ‘Oai-PMH’ Services Client A general purpose client to work with any ‘OAI-PMH’ (Open Archives Initiative Protocol for ‘Metadata’ Harvesting) service. The ‘OAI-PMH’ protocol is described at <http://www.openarchives.org/OAI/openarchivesprotocol.html>. Functions are provided to work with the ‘OAI-PMH’ verbs: ‘GetRecord’, ‘Identify’, ‘ListIdentifiers’, ‘ListMetadataFormats’, ‘ListRecords’, and ‘ListSets’.
3902 Web Technologies and Services OAIHarvester Harvest Metadata Using OAI-PMH Version 2.0 Harvest metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) version 2.0 (for more information, see <http://www.openarchives.org/OAI/openarchivesprotocol.html>).
3903 Web Technologies and Services openadds Client to Access ‘Openaddresses’ Data ‘Openaddresses’ (<https://openaddresses.io/>) client. Search, fetch data, and combine ‘datasets’. Outputs are easy to visualize with base plots, ‘ggplot2’, or ‘leaflet’.
3904 Web Technologies and Services opencage Interface to the OpenCage API Tool for accessing the OpenCage API, which provides forward geocoding (from placename to longitude and latitude) and reverse geocoding (from longitude and latitude to placename).
3905 Web Technologies and Services opencpu Producing and Reproducing Results A system for embedded scientific computing and reproducible research with R. The OpenCPU server exposes a simple but powerful HTTP api for RPC and data interchange with R. This provides a reliable and scalable foundation for statistical services or building R web applications. The OpenCPU server runs either as a single-user development server within the interactive R session, or as a multi-user Linux stack based on Apache2. The entire system is fully open source and permissively licensed. The OpenCPU website has detailed documentation and example apps.
3906 Web Technologies and Services OpenML Open Machine Learning and Open Data Platform We provide an R interface to ‘OpenML.org’ which is an online machine learning platform where researchers can access open data, download and upload data sets, share their machine learning tasks and experiments and organize them online to work and collaborate with other researchers. The R interface allows to query for data sets with specific properties, and allows the downloading and uploading of data sets, tasks, flows and runs. See <https://www.openml.org/guide/api> for more information.
3907 Web Technologies and Services osmar OpenStreetMap and R This package provides infrastructure to access OpenStreetMap data from different sources, to work with the data in common R manner, and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
3908 Web Technologies and Services osmplotr Bespoke Images of ‘OpenStreetMap’ Data Bespoke images of ‘OpenStreetMap’ (‘OSM’) data and data visualisation using ‘OSM’ objects.
3909 Web Technologies and Services osrm Interface Between R and the OpenStreetMap-Based Routing Service OSRM An interface between R and the OSRM API. OSRM is a routing service based on OpenStreetMap data. See <http://project-osrm.org/> for more information. This package allows to compute distances (travel time and kilometric distance) between points and travel time matrices.
3910 Web Technologies and Services owmr OpenWeatherMap API Wrapper Accesses OpenWeatherMap’s (owm) <https://openweathermap.org/> API. ‘owm’ itself is a service providing weather data in the past, in the future and now. Furthermore, ‘owm’ serves weather map layers usable in frameworks like ‘leaflet’. In order to access the API, you need to sign up for an API key. There are free and paid plans. Beside functions for fetching weather data from ‘owm’, ‘owmr’ supplies tools to tidy up fetched data (for fast and simple access) and to show it on leaflet maps.
3911 Web Technologies and Services ows4R Interface to OGC Web-Services (OWS) Provides an Interface to Web-Services defined as standards by the Open Geospatial Consortium (OGC), including Web Feature Service (WFS) for vector data, Catalogue Service (CSW) for ISO/OGC metadata and associated standards such as the common web-service specification (OWS) and OGC Filter Encoding. The long-term purpose is to add support for additional OGC service standards such as Web Coverage Service (WCS) and Web Processing Service (WPS).
3912 Web Technologies and Services pdftables Programmatic Conversion of PDF Tables Allows the user to convert PDF tables to formats more amenable to analysis (‘.csv’, ‘.xml’, or ‘.xlsx’) by wrapping the PDFTables API. In order to use the package, the user needs to sign up for an API account on the PDFTables website (<https://pdftables.com/pdf-to-excel-api>). The package works by taking a PDF file as input, uploading it to PDFTables, and returning a file with the extracted data.
3913 Web Technologies and Services pivotaltrackR A Client for the ‘Pivotal Tracker’ API ‘Pivotal Tracker’ <https://www.pivotaltracker.com> is a project management software-as-a-service that provides a REST API. This package provides an R interface to that API, allowing you to query it and work with its responses.
3914 Web Technologies and Services plotGoogleMaps Plot Spatial or Spatio-Temporal Data Over Google Maps Provides an interactive plot device for handling the geographic data for web browsers, designed for the automatic creation of web maps as a combination of users’ data and Google Maps layers.
3915 Web Technologies and Services plotKML Visualization of Spatial and Spatio-Temporal Objects in Google Earth Writes sp-class, spacetime-class, raster-class and similar spatial and spatio-temporal objects to KML following some basic cartographic rules.
3916 Web Technologies and Services plotly Create Interactive Web Graphics via ‘plotly.js’ Create interactive web graphics from ‘ggplot2’ graphs and/or a custom interface to the (MIT-licensed) JavaScript library ‘plotly.js’ inspired by the grammar of graphics.
3917 Web Technologies and Services plumber An API Generator for R Gives the ability to automatically generate and serve an HTTP API from R functions using the annotations in the R documentation around your functions.
3918 Web Technologies and Services plusser A Google+ Interface for R plusser provides an API interface to Google+ so that posts, profiles and pages can be automatically retrieved.
3919 Web Technologies and Services postlightmercury Parses Web Pages using Postlight Mercury This is a wrapper for the Mercury Parser API. The Mercury Parser is a single API endpoint that takes a URL and gives you back the content reliably and easily. With just one API request, Mercury takes any web article and returns only the relevant content ― headline, author, body text, relevant images and more ― free from any clutter. It’s reliable, easy-to-use and free. See the webpage here: <https://mercury.postlight.com/>.
3920 Web Technologies and Services pubmed.mineR Text Mining of PubMed Abstracts Text mining of PubMed Abstracts (text and XML) from <http://www.ncbi.nlm.nih.gov/pubmed>.
3921 Web Technologies and Services pushoverr Send Push Notifications using Pushover Send push notifications to mobile devices or the desktop using Pushover. These notifications can display job status, results, scraped web data, or any other text or numeric data.
3922 Web Technologies and Services radiant Business Analytics using R and Shiny A platform-independent browser-based interface for business analytics in R, based on the shiny package. The application combines the functionality of radiant.data, radiant.design, radiant.basics, radiant.model, and radiant.multivariate.
3923 Web Technologies and Services RAdwords Loading Google Adwords Data into R Aims at loading Google Adwords data into R. Adwords is an online advertising service that enables advertisers to display advertising copy to web users (see <https://developers.google.com/adwords/> for more information). Therefore the package implements three main features. First, the package provides an authentication process for R with the Google Adwords API (see <https://developers.google.com/adwords/api/> for more information) via OAUTH2. Second, the package offers an interface to apply the Adwords query language in R and query the Adwords API with ad-hoc reports. Third, the received data are transformed into suitable data formats for further data processing and data analysis.
3924 Web Technologies and Services randNames Package Provides Access to Fake User Data Generates random names with additional information including fake SSNs, gender, location, zip, age, address, and nationality.
3925 Web Technologies and Services rapiclient Dynamic OpenAPI/Swagger Client Access services specified in OpenAPI (formerly Swagger) format. It is not a code generator. Client is generated dynamically as a list of R functions.
3926 Web Technologies and Services rapport A Report Templating System Facilitating the creation of reproducible statistical report templates. Once created, rapport templates can be exported to various external formats (HTML, LaTeX, PDF, ODT etc.) with pandoc as the converter backend.
3927 Web Technologies and Services Rblpapi R Interface to ‘Bloomberg’ An R Interface to ‘Bloomberg’ is provided via the ‘Blp API’.
3928 Web Technologies and Services rcoreoa Client for the CORE API Client for the CORE API (<https://core.ac.uk/docs/>). CORE (<https://core.ac.uk>) aggregates open access research outputs from repositories and journals worldwide and make them available to the public.
3929 Web Technologies and Services Rcrawler Web Crawler and Scraper Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <doi:10.1016/j.softx.2017.04.004>.
3930 Web Technologies and Services rcrossref Client for Various ‘CrossRef’ ‘APIs’ Client for various ‘CrossRef’ ‘APIs’, including ‘metadata’ search with their old and newer search ‘APIs’, get ‘citations’ in various formats (including ‘bibtex’, ‘citeproc-json’, ‘rdf-xml’, etc.), convert ‘DOIs’ to ‘PMIDs’, and ‘vice versa’, get citations for ‘DOIs’, and get links to full text of articles when available.
3931 Web Technologies and Services RCurl General Network (HTTP/FTP/…) Client Interface for R A wrapper for ‘libcurl’ <http://curl.haxx.se/libcurl/> Provides functions to allow one to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server. This provides a great deal of control over the HTTP/FTP/… connection and the form of the request while providing a higher-level interface than is available just using R socket connections. Additionally, the underlying implementation is robust and extensive, supporting FTP/FTPS/TFTP (uploads and downloads), SSL/HTTPS, telnet, dict, ldap, and also supports cookies, redirects, authentication, etc.
3932 Web Technologies and Services rdatacite ‘DataCite’ Client for ‘OAI-PMH’ Methods and their Search API Client for the web service methods provided by ‘DataCite’ (<https://www.datacite.org/>), including functions to interface with their ‘OAI-PMH’ ‘metadata’ service, and a ‘RESTful’ search API. The API is backed by ‘SOLR’, allowing expressive queries, including faceting, statistics on variables, and ‘more-like-this’ queries.
3933 Web Technologies and Services rdpla Client for the Digital Public Library of America (‘DPLA’) Interact with the Digital Public Library of America <https://dp.la> (‘DPLA’) ‘REST’ ‘API’ <https://dp.la/info/developers/codex/> from R, including search and more.
3934 Web Technologies and Services rdrop2 Programmatic Interface to the ‘Dropbox’ API Provides full programmatic access to the ‘Dropbox’ file hosting platform <https://dropbox.com>, including support for all standard file operations.
3935 Web Technologies and Services redcapAPI Interface to ‘REDCap’ Access data stored in ‘REDCap’ databases using the Application Programming Interface (API). ‘REDCap’ (Research Electronic Data CAPture; <https://projectredcap.org>) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The API allows users to access data and project meta data (such as the data dictionary) from the web programmatically. The ‘redcapAPI’ package facilitates the process of accessing data with options to prepare an analysis-ready data set consistent with the definitions in a database’s data dictionary.
3936 Web Technologies and Services refimpact API Wrapper for the UK REF 2014 Impact Case Studies Database Provides wrapper functions around the UK Research Excellence Framework 2014 Impact Case Studies Database API <http://impact.ref.ac.uk/>. The database contains relevant publication and research metadata about each case study as well as several paragraphs of text from the case study submissions. Case studies in the database are licenced under a CC-BY 4.0 licence <http://creativecommons.org/licenses/by/4.0/legalcode>.
3937 Web Technologies and Services RefManageR Straightforward ‘BibTeX’ and ‘BibLaTeX’ Bibliography Management Provides tools for importing and working with bibliographic references. It greatly enhances the ‘bibentry’ class by providing a class ‘BibEntry’ which stores ‘BibTeX’ and ‘BibLaTeX’ references, supports ‘UTF-8’ encoding, and can be easily searched by any field, by date ranges, and by various formats for name lists (author by last names, translator by full names, etc.). Entries can be updated, combined, sorted, printed in a number of styles, and exported. ‘BibTeX’ and ‘BibLaTeX’ ‘.bib’ files can be read into ‘R’ and converted to ‘BibEntry’ objects. Interfaces to ‘NCBI Entrez’, ‘CrossRef’, and ‘Zotero’ are provided for importing references and references can be created from locally stored ‘PDF’ files using ‘Poppler’. Includes functions for citing and generating a bibliography with hyperlinks for documents prepared with ‘RMarkdown’ or ‘RHTML’.
3938 Web Technologies and Services repmis Miscellaneous Tools for Reproducible Research Tools to load ‘R’ packages and automatically generate BibTeX files citing them as well as load and cache plain-text and ‘Excel’ formatted data stored on ‘GitHub’, and from other sources.
3939 Web Technologies and Services reqres Powerful Classes for HTTP Requests and Responses In order to facilitate parsing of http requests and creating appropriate responses this package provides two classes to handle a lot of the housekeeping involved in working with http exchanges. The infrastructure builds upon the ‘rook’ specification and is thus well suited to be combined with ‘httpuv’ based web servers.
3940 Web Technologies and Services request High Level ‘HTTP’ Client High level and easy ‘HTTP’ client for ‘R’. Provides functions for building ‘HTTP’ queries, including query parameters, body requests, headers, authentication, and more.
3941 Web Technologies and Services rerddap General Purpose Client for ‘ERDDAP’ Servers General purpose R client for ‘ERDDAP’ servers. Includes functions to search for ‘datasets’, get summary information on ‘datasets’, and fetch ‘datasets’, in either ‘csv’ or ‘netCDF’ format. ‘ERDDAP’ information: <https://upwell.pfeg.noaa.gov/erddap/information.html>.
3942 Web Technologies and Services restfulr R Interface to RESTful Web Services Models a RESTful service as if it were a nested R list.
3943 Web Technologies and Services restimizeapi Functions for Working with the ‘www.estimize.com’ Web Services Provides the user with functions to develop their trading strategy, uncover actionable trading ideas, and monitor consensus shifts with crowdsourced earnings and economic estimate data directly from <www.estimize.com>. Further information regarding the web services this package invokes can be found at <www.estimize.com/api>.
3944 Web Technologies and Services Rexperigen R Interface to Experigen Provides convenience functions to communicate with an Experigen server: Experigen (<http://github.com/aquincum/experigen>) is an online framework for creating linguistic experiments, and it stores the results on a dedicated server. This package can be used to retrieve the results from the server, and it is especially helpful with registered experiments, as authentication with the server has to happen.
3945 Web Technologies and Services Rfacebook Access to Facebook API via R Provides an interface to the Facebook API.
3946 Web Technologies and Services rfigshare An R Interface to ‘figshare’ An interface to ‘figshare’ (http://figshare.com), a scientific repository to archive and assign ‘DOIs’ to data, software, figures, and more.
3947 Web Technologies and Services RForcecom Data Integration Feature for Force.com and Salesforce.com Insert, update, retrieve, delete and bulk operate datasets with a SaaS based CRM Salesforce.com and a PaaS based application platform Force.com from R.
3948 Web Technologies and Services RGA A Google Analytics API Client Provides functions for accessing and retrieving data from the Google Analytics APIs (https://developers.google.com/analytics/). Supports OAuth 2.0 authorization. Package provides access to the Management, Core Reporting, Multi-Channel Funnels Reporting, Real Time Reporting and Metadata APIs. Access to all the Google Analytics accounts which the user has access to. Auto-pagination to return more than 10,000 rows of the results by combining multiple data requests. Also package provides shiny app to explore the core reporting API dimensions and metrics.
3949 Web Technologies and Services rgeolocate IP Address Geolocation Connectors to online and offline sources for taking IP addresses and geolocating them to country, city, timezone and other geographic ranges. For individual connectors, see the package index.
3950 Web Technologies and Services RGoogleFit R Interface to Google Fit API Provides interface to Google Fit REST API v1 (see <https://developers.google.com/fit/rest/v1/reference/>).
3951 Web Technologies and Services RgoogleMaps Overlays on Static Maps Serves two purposes: (i) Provide a comfortable R interface to query the Google server for static maps, and (ii) Use the map as a background image to overlay plots within R. This requires proper coordinate scaling.
3952 Web Technologies and Services rhub Connect to ‘R-hub’ Run ‘R CMD check’ on any of the ‘R-hub’ (<https://builder.r-hub.io/>) architectures, from the command line. The current architectures include ‘Windows’, ‘macOS’, ‘Solaris’ and various ‘Linux’ distributions.
3953 Web Technologies and Services rio A Swiss-Army Knife for Data I/O Streamlined data import and export by making assumptions that the user is probably willing to make: ‘import()’ and ‘export()’ determine the data structure from the file extension, reasonable defaults are used for data import and export (e.g., ‘stringsAsFactors=FALSE’), web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly without explicit decompression, and fast import packages are used where appropriate. An additional convenience function, ‘convert()’, provides a simple method for converting between file types.
3954 Web Technologies and Services rjson JSON for R Converts R object into JSON objects and vice-versa.
3955 Web Technologies and Services rjsonapi Consumer for APIs that Follow the JSON API Specification Consumer for APIs that Follow the JSON API Specification (<http://jsonapi.org/>). Package mostly consumes data - with experimental support for serving JSON API data.
3956 Web Technologies and Services RJSONIO Serialize R Objects to JSON, JavaScript Object Notation This is a package that allows conversion to and from data in Javascript object notation (JSON) format. This allows R objects to be inserted into Javascript/ECMAScript/ActionScript code and allows R programmers to read and convert JSON content to R objects. This is an alternative to rjson package. Originally, that was too slow for converting large R objects to JSON and was not extensible. rjson’s performance is now similar to this package, and perhaps slightly faster in some cases. This package uses methods and is readily extensible by defining methods for different classes, vectorized operations, and C code and callbacks to R functions for deserializing JSON objects to R. The two packages intentionally share the same basic interface. This package (RJSONIO) has many additional options to allow customizing the generation and processing of JSON content. This package uses libjson rather than implementing yet another JSON parser. The aim is to support other general projects by building on their work, providing feedback and benefit from their ongoing development.
3957 Web Technologies and Services rLTP R Interface to the ‘LTP’-Cloud Service R interface to the ‘LTP’-Cloud service for Natural Language Processing in Chinese (http://www.ltp-cloud.com/).
3958 Web Technologies and Services roadoi Find Free Versions of Scholarly Publications via Unpaywall This web client interfaces Unpaywall <https://unpaywall.org/products/api>, formerly oaDOI, a service finding free full-texts of academic papers by linking DOIs with open access journals and repositories. It provides unified access to various data sources for open access full-text links including Crossref and the Directory of Open Access Journals (DOAJ). API usage is free and no registration is required.
3959 Web Technologies and Services ROAuth R Interface For OAuth Provides an interface to the OAuth 1.0 specification allowing users to authenticate via OAuth to the server of their choice.
3960 Web Technologies and Services robotstxt A ‘robots.txt’ Parser and ‘Webbot’/‘Spider’/‘Crawler’ Permissions Checker Provides functions to download and parse ‘robots.txt’ files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, …) are allowed to access specific resources on a domain.
3961 Web Technologies and Services Rook Rook - a web server interface for R This package contains the Rook specification and convenience software for building and running Rook applications. To get started, be sure and read the ‘Rook’ help file first.
3962 Web Technologies and Services ROpenFIGI R Interface to OpenFIGI Provide a simple interface to Bloomberg’s OpenFIGI API. Please see <https://openfigi.com> for API details and registration. You may be eligible to have an API key to accelerate your loading process.
3963 Web Technologies and Services ROpenWeatherMap R Interface to OpenWeatherMap API OpenWeatherMap (OWM) <http://openweathermap.org/api> is a service providing weather related data. This package can be used to access current weather data for one location or several locations. It can also be used to forecast weather for 5 days with data for every 3 hours.
3964 Web Technologies and Services rorcid Interface to the ‘Orcid.org’ ‘API’ Client for the ‘Orcid.org’ ‘API’ (<https://orcid.org/>). Functions included for searching for people, searching by ‘DOI’, and searching by ‘Orcid’ ‘ID’.
3965 Web Technologies and Services rosetteApi ‘Rosette’ API ‘Rosette’ is an API for multilingual text analysis and information extraction. More information can be found at <https://developer.rosette.com>.
3966 Web Technologies and Services routr A Simple Router for HTTP and WebSocket Requests In order to make sure that web request ends up in the correct handler function a router is often used. ‘routr’ is a package implementing a simple but powerful routing functionality for R based servers. It is a fully functional ‘fiery’ plugin, but can also be used with other ‘httpuv’ based servers.
3967 Web Technologies and Services rpinterest Access Pinterest API Get information (boards, pins and users) from the Pinterest <http://www.pinterest.com> API.
3968 Web Technologies and Services rplos Interface to the Search API for ‘PLoS’ Journals A programmatic interface to the ‘SOLR’ based search API (<http://api.plos.org/>) provided by the Public Library of Science journals to search their articles. Functions are included for searching for articles, retrieving articles, making plots, doing ‘faceted’ searches, ‘highlight’ searches, and viewing results of ‘highlighted’ searches in a browser.
3969 Web Technologies and Services RPushbullet R Interface to the Pushbullet Messaging Service An R interface to the Pushbullet messaging service which provides fast and efficient notifications (and file transfer) between computers, phones and tablets. An account has to be registered at the site http://www.pushbullet.com site to obtain a (free) API key.
3970 Web Technologies and Services rrefine R Client for OpenRefine API ‘OpenRefine’ (formerly ‘Google Refine’) is a popular, open source data cleaning software. This package enables users to programmatically trigger data transfer between R and ‘OpenRefine’. Available functionality includes project import, export and deletion.
3971 Web Technologies and Services RSauceLabs R Wrapper for ‘SauceLabs’ REST API Retrieve, update, delete job information from <https://saucelabs.com/>. Poll the ‘SauceLabs’ services current status and access supported platforms. Send and retrieve files from ‘SauceLabs’ and manage tunnels associated with ‘SauceConnect’.
3972 Web Technologies and Services RSclient Client for Rserve Client for Rserve, allowing to connect to Rserve instances and issue commands.
3973 Web Technologies and Services rscopus Scopus Database ‘API’ Interface Uses Elsevier ‘Scopus’ API <https://dev.elsevier.com/sc_apis.html> to download information about authors and their citations.
3974 Web Technologies and Services rsdmx Tools for Reading SDMX Data and Metadata Set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework, currently focusing on the SDMX XML standard format (SDMX-ML).
3975 Web Technologies and Services RSelenium R Bindings for ‘Selenium WebDriver’ Provides a set of R bindings for the ‘Selenium 2.0 WebDriver’ (see <https://seleniumhq.github.io/docs/wd.html> for more information) using the ‘JsonWireProtocol’ (see <https://github.com/SeleniumHQ/selenium/wiki/JsonWireProtocol> for more information). ‘Selenium 2.0 WebDriver’ allows driving a web browser natively as a user would either locally or on a remote machine using the Selenium server it marks a leap forward in terms of web browser automation. Selenium automates web browsers (commonly referred to as browsers). Using RSelenium you can automate browsers locally or remotely.
3976 Web Technologies and Services Rserve Binary R server Rserve acts as a socket server (TCP/IP or local sockets) which allows binary requests to be sent to R. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++ and Java, allowing any application to use facilities of R without the need of linking to R code. Rserve supports remote connection, user authentication and file transfer. A simple R client is included in this package as well.
3977 Web Technologies and Services RSiteCatalyst R Client for Adobe Analytics API V1.4 Functions for interacting with the Adobe Analytics API V1.4 (<https://api.omniture.com/admin/1.4/rest/>).
3978 Web Technologies and Services RSmartlyIO Loading Facebook and Instagram Advertising Data from ‘Smartly.io’ Aims at loading Facebook and Instagram advertising data from ‘Smartly.io’ into R. ‘Smartly.io’ is an online advertising service that enables advertisers to display commercial ads on social media networks (see <http://www.smartly.io/> for more information). The package offers an interface to query the ‘Smartly.io’ API and loads data directly into R for further data processing and data analysis.
3979 Web Technologies and Services RSocrata Download or Upload ‘Socrata’ Data Sets Provides easier interaction with ‘Socrata’ open data portals <http://dev.socrata.com>. Users can provide a ‘Socrata’ data set resource URL, or a ‘Socrata’ Open Data API (SoDA) web query, or a ‘Socrata’ “human-friendly” URL, returns an R data frame. Converts dates to ‘POSIX’ format and manages throttling by ‘Socrata’. Users can upload data to ‘Socrata’ portals directly from R.
3980 Web Technologies and Services RStripe A Convenience Interface for the Stripe Payment API A convenience interface for communicating with the Stripe payment processor to accept payments online. See <https://stripe.com> for more information.
3981 Web Technologies and Services rtweet Collecting Twitter Data An implementation of calls designed to collect and organize Twitter data via Twitter’s REST and stream Application Program Interfaces (API), which can be found at the following URL: <https://developer.twitter.com/en/docs>.
3982 Web Technologies and Services rvest Easily Harvest (Scrape) Web Pages Wrappers around the ‘xml2’ and ‘httr’ packages to make it easy to download, then manipulate, HTML and XML.
3983 Web Technologies and Services rwars R Client for the Star Wars API Provides functions to retrieve and reformat data from the ‘Star Wars’ API (SWAPI) <https://swapi.co/>.
3984 Web Technologies and Services RYandexTranslate R Interface to Yandex Translate API ‘Yandex Translate’ (https://translate.yandex.com/) is a statistical machine translation system. The system translates separate words, complete texts, and webpages. This package can be used to detect language from text and to translate it to supported target language. For more info: https://tech.yandex.com/translate/doc/dg/concepts/About-docpage/ .
3985 Web Technologies and Services RZabbix R Module for Working with the ‘Zabbix API’ R interface to the ‘Zabbix API’ data <https://www.zabbix.com/documentation/3.0/manual/api/reference>. Enables easy and direct communication with ‘Zabbix API’ from ‘R’.
3986 Web Technologies and Services scholar Analyse Citation Data from Google Scholar Provides functions to extract citation data from Google Scholar. Convenience functions are also provided for comparing multiple scholars and predicting future h-index values.
3987 Web Technologies and Services scrapeR Tools for Scraping Data from HTML and XML Documents Tools for Scraping Data from Web-Based Documents
3988 Web Technologies and Services searchConsoleR Google Search Console R Client Provides an interface with the Google Search Console, formally called Google Webmaster Tools.
3989 Web Technologies and Services securitytxt Identify and Parse Web Security Policies Files When security risks in web services are discovered by independent security researchers who understand the severity of the risk, they often lack the channels to properly disclose them. As a result, security issues may be left unreported. The ‘security.txt’ ‘Web Security Policies’ specification defines an ‘IETF’ draft standard <https://tools.ietf.org/html/draft-foudil-securitytxt-00> to help organizations define the process for security researchers to securely disclose security vulnerabilities. Tools are provided to help identify and parse ‘security.txt’ files to enable analysis of the usage and adoption of these policies.
3990 Web Technologies and Services seleniumPipes R Client Implementing the W3C WebDriver Specification The W3C WebDriver specification defines a way for out-of-process programs to remotely instruct the behaviour of web browsers. It is detailed at <https://w3c.github.io/webdriver/webdriver-spec.html>. This package provides an R client implementing the W3C specification.
3991 Web Technologies and Services sendmailR send email using R Package contains a simple SMTP client which provides a portable solution for sending email, including attachment, from within R.
3992 Web Technologies and Services servr A Simple HTTP Server to Serve Static Files or Dynamic Documents Start an HTTP server in R to serve static files, or dynamic documents that can be converted to HTML files (e.g., R Markdown) under a given directory.
3993 Web Technologies and Services shiny (core) Web Application Framework for R Makes it incredibly easy to build interactive web applications with R. Automatic “reactive” binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.
3994 Web Technologies and Services shutterstock Access ‘Shutterstock’ REST API Access ‘Shutterstock’ API from R. The ‘Shutterstock’ API presents access to search, view, license and download the media and information from the ’Shutterstock’s library <https://api-reference.shutterstock.com/>.
3995 Web Technologies and Services slackr Send Messages, Images, R Objects and Files to ‘Slack’ Channels/Users ‘Slack’ <http://slack.com/> provides a service for teams to collaborate by sharing messages, images, links, files and more. Functions are provided that make it possible to interact with the ‘Slack’ platform ‘API’. When you need to share information or data from R, rather than resort to copy/ paste in e-mails or other services like ‘Skype’ <http://www.skype.com/>, you can use this package to send well-formatted output from multiple R objects and expressions to all teammates at the same time with little effort. You can also send images from the current graphics device, R objects, and upload files.
3996 Web Technologies and Services soql Helps Make Socrata Open Data API Calls Used to construct the URLs and parameters of ‘Socrata Open Data API’ <https://dev.socrata.com> calls, using the API’s ‘SoQL’ parameter format. Has method-chained and sensical syntax. Plays well with pipes.
3997 Web Technologies and Services sparkbq Google ‘BigQuery’ Support for ‘sparklyr’ A ‘sparklyr’ extension package providing an integration with Google ‘BigQuery’. It supports direct import/export where records are directly streamed from/to ‘BigQuery’. In addition, data may be imported/exported via intermediate data extracts on Google ‘Cloud Storage’.
3998 Web Technologies and Services spiderbar Parse and Test Robots Exclusion Protocol Files and Rules The ‘Robots Exclusion Protocol’ <http://www.robotstxt.org/orig.html> documents a set of standards for allowing or excluding robot/spider crawling of different areas of site content. Tools are provided which wrap The ‘rep-cpp’ <https://github.com/seomoz/rep-cpp> C++ library for processing these ‘robots.txt’ files.
3999 Web Technologies and Services splashr Tools to Work with the ‘Splash’ ‘JavaScript’ Rendering and Scraping Service ‘Splash’ <https://github.com/scrapinghub/splash> is a ‘JavaScript’ rendering service. It is a lightweight web browser with an ‘HTTP’ API, implemented in ‘Python’ using ‘Twisted’ and ‘QT’ and provides some of the core functionality of the ‘RSelenium’ or ‘seleniumPipes’ R packages in a lightweight footprint. Some of ‘Splash’ features include the ability to process multiple web pages in parallel; retrieving ‘HTML’ results and/or take screen shots; disabling images or use ‘Adblock Plus’ rules to make rendering faster; executing custom ‘JavaScript’ in page context; getting detailed rendering info in ‘HAR’ format.
4000 Web Technologies and Services streamR Access to Twitter Streaming API via R Functions to access Twitter’s filter, sample, and user streams, and to parse the output into data frames.
4001 Web Technologies and Services swagger Dynamically Generates Documentation from a ‘Swagger’ Compliant API A collection of ‘HTML’, ‘JavaScript’, and ‘CSS’ assets that dynamically generate beautiful documentation from a ‘Swagger’ compliant API: <https://swagger.io/specification/>.
4002 Web Technologies and Services telegram R Wrapper Around the Telegram Bot API R wrapper around the Telegram Bot API (http://core.telegram.org/bots/api) to access Telegram’s messaging facilities with ease (e.g. you send messages, images, files from R to your smartphone).
4003 Web Technologies and Services threewords Represent Precise Coordinates in Three Words A connector to the ‘What3Words’ (http://what3words.com/) service, which represents each 3m by 3m square on earth with a unique trio of English-language words.
4004 Web Technologies and Services tidyRSS Tidy RSS for R With the objective of including data from RSS feeds into your analysis, ‘tidyRSS’ parses RSS, Atom XML, JSON and geoRSS feeds and returns a tidy data frame.
4005 Web Technologies and Services tm.plugin.webmining Retrieve Structured, Textual Data from Various Web Sources Facilitate text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even retrieves and extracts the text of the original text source.
4006 Web Technologies and Services transcribeR Automated Transcription of Audio Files Through the HP IDOL API Transcribes audio to text with the HP IDOL API. Includes functions to upload files, retrieve transcriptions, and monitor jobs.
4007 Web Technologies and Services translate Bindings for the Google Translate API v2 Bindings for the Google Translate API v2
4008 Web Technologies and Services translateR Bindings for the Google and Microsoft Translation APIs translateR provides easy access to the Google and Microsoft APIs. The package is easy to use with the related R package “stm” for the estimation of multilingual topic models.
4009 Web Technologies and Services trelloR R API for Trello Provides access to Trello API (<https://developers.trello.com/>). A family of GET functions make it easy to retrieve cards, labels, members, teams and other data from both public and private boards. Server responses are formatted upon retrieval. Automated paging allows for large requests that exceed server limit. See <https://github.com/jchrom/trelloR> for more information.
4010 Web Technologies and Services tuber Client for the YouTube API Get comments posted on YouTube videos, information on how many times a video has been liked, search for videos with particular content, and much more. You can also scrape captions from a few videos. To learn more about the YouTube API, see <https://developers.google.com/youtube/v3/>.
4011 Web Technologies and Services tubern R Client for the YouTube Analytics and Reporting API Get statistics and reports from YouTube. To learn more about the YouTube Analytics and Reporting API, see <https://developers.google.com/youtube/reporting/>.
4012 Web Technologies and Services tumblR Access to Tumblr v2 API Provides an R-interface to the Tumblr web API (see Tumblr v2 API on https://www.tumblr.com/docs/en/api/v2). Tumblr is a microblogging platform and social networking website (https://www.tumblr.com).
4013 Web Technologies and Services tweet2r Twitter Collector for R and Export to ‘SQLite’, ‘postGIS’ and ‘GIS’ Format This is an improved implementation of the package ‘StreamR’ to capture tweets and store it into R, SQLite, ‘postGIS’ data base or GIS format. The package performs a description of harvested data and performs space time exploratory analysis.
4014 Web Technologies and Services twitteR R Based Twitter Client Provides an interface to the Twitter web API.
4015 Web Technologies and Services uaparserjs Parse Browser ‘User-Agent’ Strings into Data Frames Despite there being a section in RFC 7231 <https://tools.ietf.org/html/rfc7231#section-5.5.3> defining a suggested structure for ‘User-Agent’ headers this data is notoriously difficult to parse consistently. A function is provided that will take in user agent strings and return structured R objects. This is a ‘V8’-backed package based on the ‘ua-parser’ project <https://github.com/ua-parser>.
4016 Web Technologies and Services udapi Urban Dictionary API Client A client for the Urban Dictionary <http://www.urbandictionary.com/> API.
4017 Web Technologies and Services urlshorteneR R Wrapper for the ‘Bit.ly’, ‘Goo.gl’ and ‘Is.gd’ URL Shortening Services Allows using different URL shortening services, which also provide expanding and analytic functions. Specifically developed for ‘Bit.ly’, ‘Goo.gl’ (both OAuth2) and ‘is.gd’ (no API key). Others can be added by request.
4018 Web Technologies and Services urltools Vectorised Tools for URL Handling and Parsing A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.
4019 Web Technologies and Services V8 Embedded JavaScript Engine for R An R interface to Google’s open source JavaScript engine. This package can now be compiled either with V8 version 6 or 7 (LTS) from nodejs or with the legacy 3.14/3.15 branch of V8.
4020 Web Technologies and Services validatejsonr Validate JSON Against JSON Schemas The current implementation uses the C++ library ‘RapidJSON’ to supply the schema functionality, it supports JSON Schema Draft v4. As of 2016-09-09, ‘RapidJSON’ passed 262 out of 263 tests in JSON Schema Test Suite (JSON Schema draft 4).
4021 Web Technologies and Services vcr (core) Record ‘HTTP’ Calls to Disk Record test suite ‘HTTP’ requests and replays them during future runs. A port of the Ruby gem of the same name (<https://github.com/vcr/vcr/>). Works by hooking into the ‘webmockr’ R package for matching ‘HTTP’ requests by various rules (‘HTTP’ method, ‘URL’, query parameters, headers, body, etc.), and then caching real ‘HTTP’ responses on disk in ‘cassettes’. Subsequent ‘HTTP’ requests matching any previous requests in the same ‘cassette’ use a cached ‘HTTP’ response.
4022 Web Technologies and Services vkR Access to VK API via R Provides an interface to the VK API <https://vk.com/dev/methods>. VK <https://vk.com/> is the largest European online social networking service, based in Russia.
4023 Web Technologies and Services W3CMarkupValidator R Interface to W3C Markup Validation Services R interface to a W3C Markup Validation service. See <http://validator.w3.org/> for more information.
4024 Web Technologies and Services webmockr (core) Stubbing and Setting Expectations on ‘HTTP’ Requests Stubbing and setting expectations on ‘HTTP’ requests. Includes tools for stubbing ‘HTTP’ requests, including expected request conditions and response conditions. Match on ‘HTTP’ method, query parameters, request body, headers and more. Can be used for unit tests or outside of a testing context.
4025 Web Technologies and Services webreadr Tools for Reading Formatted Access Log Files R is used by a vast array of people for a vast array of purposes - including web analytics. This package contains functions for consuming and munging various common forms of request log, including the Common and Combined Web Log formats and various Amazon access logs.
4026 Web Technologies and Services webshot Take Screenshots of Web Pages Takes screenshots of web pages, including Shiny applications and R Markdown documents.
4027 Web Technologies and Services webutils Utility Functions for Developing Web Applications High performance in-memory http request parser for application/json, multipart/form-data, and application/x-www-form-urlencoded. Includes live demo of hosting and parsing multipart forms with either ‘httpuv’ or ‘Rhttpd’.
4028 Web Technologies and Services whisker {{mustache}} for R, logicless templating logicless templating, reuse templates in many programming languages including R
4029 Web Technologies and Services WikidataQueryServiceR API Client Library for ‘Wikidata Query Service’ An API client for the ‘Wikidata Query Service’ <https://query.wikidata.org/>.
4030 Web Technologies and Services WikidataR API Client Library for ‘Wikidata’ An API client for the Wikidata <http://wikidata.org/> store of semantic data.
4031 Web Technologies and Services wikipediatrend Public Subject Attention via Wikipedia Page View Statistics Public attention is an interesting field of study. The internet not only allows to access information in no time on virtually any subject but via page access statistics gathered by website authors the subject of attention as well can be studied.
4032 Web Technologies and Services WikipediR A MediaWiki API Wrapper A wrapper for the MediaWiki API, aimed particularly at the Wikimedia ‘production’ wikis, such as Wikipedia. It can be used to retrieve page text, information about users or the history of pages, and elements of the category tree.
4033 Web Technologies and Services WufooR R Wrapper for the ‘Wufoo.com’ - The Form Building Service Allows form managers to download entries from their respondents using Wufoo JSON API (<https://www.wufoo.com>). Additionally, the Wufoo reports - when public - can be also acquired programmatically. Note that building new forms within this package is not supported.
4034 Web Technologies and Services XML Tools for Parsing and Generating XML Within R and S-Plus Many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP. Also offers access to an ‘XPath’ “interpreter”.
4035 Web Technologies and Services xml2 (core) Parse XML Work with XML files using a simple, consistent interface. Built on top of the ‘libxml2’ C library.
4036 Web Technologies and Services XML2R EasieR XML data collection XML2R is a framework that reduces the effort required to transform XML content into number of tables while preserving parent to child relationships.
4037 Web Technologies and Services xslt Extensible Style-Sheet Language Transformations An extension for the ‘xml2’ package to transform XML documents by applying an ‘xslt’ style-sheet.
4038 Web Technologies and Services yhatr R Binder for the Yhat API Deploy, maintain, and invoke models via the Yhat REST API.
4039 Web Technologies and Services yummlyr R Bindings for Yummly API Yummly.com is one of the world’s largest and most powerful recipe search sites and this package aims to provide R bindings for publicly available Yummly.com Recipe API (https://developer.yummly.com/).
4040 Web Technologies and Services zendeskR Zendesk API Wrapper This package provides an R wrapper for the Zendesk API
4041 Web Technologies and Services ZillowR R Interface to Zillow Real Estate and Mortgage Data API Zillow, an online real estate company, provides real estate and mortgage data for the United States through a REST API. The ZillowR package provides an R function for each API service, making it easy to make API calls and process the response into convenient, R-friendly data structures. See <http://www.zillow.com/howto/api/APIOverview.htm> for the Zillow API Documentation.
4042 gRaphical Models in R abn Modelling Multivariate Data with Additive Bayesian Networks Bayesian network analysis is a form of probabilistic graphical models which derives from empirical data a directed acyclic graph, DAG, describing the dependency structure between random variables. An additive Bayesian network model consists of a form of a DAG where each node comprises a generalized linear model, GLM. Additive Bayesian network models are equivalent to Bayesian multivariate regression using graphical modelling, they generalises the usual multivariable regression, GLM, to multiple dependent variables. ‘abn’ provides routines to help determine optimal Bayesian network models for a given data set, where these models are used to identify statistical dependencies in messy, complex data. The additive formulation of these models is equivalent to multivariate generalised linear modelling (including mixed models with iid random effects). The usual term to describe this model selection process is structure discovery. The core functionality is concerned with model selection - determining the most robust empirical model of data from interdependent variables. Laplace approximations are used to estimate goodness of fit metrics and model parameters, and wrappers are also included to the INLA package which can be obtained from <http://www.r-inla.org>. A comprehensive set of documented case studies, numerical accuracy/quality assurance exercises, and additional documentation are available from the ‘abn’ website.
4043 gRaphical Models in R bayesmix Bayesian Mixture Models with JAGS The fitting of finite mixture models of univariate Gaussian distributions using JAGS within a Bayesian framework is provided.
4044 gRaphical Models in R BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC Statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889>, Letac et al. (2018) <arXiv:1706.04416>, Dobra and Mohammadi (2018) <doi:10.1214/18-AOAS1164>, Mohammadi et al. (2017) <doi:10.1111/rssc.12171>. To speed up the computations, the BDMCMC sampling algorithms are implemented in parallel using OpenMP in C++.
4045 gRaphical Models in R bnlearn Bayesian Network Structure Learning, Parameter Learning and Inference Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC and RSMAX2) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries and cross-validation. Development snapshots with the latest bugfixes are available from <http://www.bnlearn.com>.
4046 gRaphical Models in R bnstruct Bayesian Network Structure Learning from Data with Missing Values Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Parents-and-Children, the Hill-Climbing, the Max-Min Hill-climbing heuristic searches, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.
4047 gRaphical Models in R boa Bayesian Output Analysis Program (BOA) for MCMC A menu-driven program and library of functions for carrying out convergence diagnostics and statistical and graphical analysis of Markov chain Monte Carlo sampling output.
4048 gRaphical Models in R BRugs Interface to the ‘OpenBUGS’ MCMC Software Fully-interactive R interface to the ‘OpenBUGS’ software for Bayesian analysis using MCMC sampling. Runs natively and stably in 32-bit R under Windows. Versions running on Linux and on 64-bit R under Windows are in “beta” status and less efficient.
4049 gRaphical Models in R catnet Categorical Bayesian Network Inference Structure learning and parameter estimation of discrete Bayesian networks using likelihood-based criteria. Exhaustive search for fixed node orders and stochastic search of optimal orders via simulated annealing algorithm are implemented.
4050 gRaphical Models in R coda Output Analysis and Diagnostics for MCMC Provides functions for summarizing and plotting the output from Markov Chain Monte Carlo (MCMC) simulations, as well as diagnostic tests of convergence to the equilibrium distribution of the Markov chain.
4051 gRaphical Models in R dclone Data Cloning and MCMC Tools for Maximum Likelihood Methods Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 (R Journal 2(2):2937). Sequential and parallel MCMC support for ‘JAGS’, ‘WinBUGS’, ‘OpenBUGS’, and ‘Stan’.
4052 gRaphical Models in R diagram Functions for Visualising Simple Graphs (Networks), Plotting Flow Diagrams Visualises simple graphs (networks) based on a transition matrix, utilities to plot flow diagrams, visualising webs, electrical networks, etc. Support for the book “A practical guide to ecological modelling - using R as a simulation platform” by Karline Soetaert and Peter M.J. Herman (2009), Springer. and the book “Solving Differential Equations in R” by Karline Soetaert, Jeff Cash and Francesca Mazzia (2012), Springer. Includes demo(flowchart), demo(plotmat), demo(plotweb).
4053 gRaphical Models in R DiagrammeR Graph/Network Visualization Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
4054 gRaphical Models in R dynamicGraph dynamicGraph Interactive graphical tool for manipulating graphs
4055 gRaphical Models in R ergm Fit, Simulate and Diagnose Exponential-Family Models for Networks An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). ‘ergm’ is a part of the Statnet suite of packages for network analysis.
4056 gRaphical Models in R FBFsearch Algorithm for Searching the Space of Gaussian Directed Acyclic Graph Models Through Moment Fractional Bayes Factors We propose an objective Bayesian algorithm for searching the space of Gaussian directed acyclic graph (DAG) models. The algorithm proposed makes use of moment fractional Bayes factors (MFBF) and thus it is suitable for learning sparse graph. The algorithm is implemented by using Armadillo: an open-source C++ linear algebra library.
4057 gRaphical Models in R GeneNet Modeling and Inferring Gene Networks Analyzes gene expression (time series) data with focus on the inference of gene networks. In particular, GeneNet implements the methods of Schaefer and Strimmer (2005a,b,c) and Opgen-Rhein and Strimmer (2006, 2007) for learning large-scale gene association networks (including assignment of putative directions).
4058 gRaphical Models in R ggm Functions for graphical Markov models Functions and datasets for maximum likelihood fitting of some classes of graphical Markov models.
4059 gRaphical Models in R gRain Graphical Independence Networks Probability propagation in graphical independence networks, also known as Bayesian networks or probabilistic expert systems.
4060 gRaphical Models in R gRbase (core) A Package for Graphical Modelling in R The ‘gRbase’ package provides general features which are used by other graphical modelling packages, in particular by the packages ‘gRain’, ‘gRim’ and ‘gRc’. ‘gRbase’ contains several data sets relevant for use in connection with graphical models. Almost all data sets used in the book Graphical Models with R (2012) are contained in ‘gRbase’. ‘gRbase’ implements several graph algorithms (based mainly on representing graphs as adjacency matrices - either in the form of a standard matrix or a sparse matrix). Some graph algorithms are: (i) maximum cardinality search (for marked and unmarked graphs). (ii) moralize. (iii) triangulate. (iv) junction tree. ‘gRbase’ facilitates array operations, ‘gRbase’ implements functions for testing for conditional independence. ‘gRbase’ illustrates how hierarchical log-linear models may be implemented and describes concept of graphical meta data. These features, however, are not maintained anymore and remains in ‘gRbase’ only because there exists a paper describing these facilities: A Common Platform for Graphical Models in R: The ‘gRbase’ Package, Journal of Statistical Software, Vol 14, No 17, 2005. NOTICE Proper functionality of ‘gRbase’ requires that the packages graph, ‘Rgraphviz’ and ‘RBGL’ are installed from ‘bioconductor’; for installation instructions please refer to the web page given below.
4061 gRaphical Models in R gRim Graphical Interaction Models Provides the following types of models: Models for for contingency tables (i.e. log-linear models) Graphical Gaussian models for multivariate normal data (i.e. covariance selection models) Mixed interaction models.
4062 gRaphical Models in R huge High-Dimensional Undirected Graph Estimation Provides a general framework for high-dimensional undirected graph estimation. It integrates data preprocessing, neighborhood screening, graph estimation, and model selection techniques into a pipeline. In preprocessing stage, the nonparanormal(npn) transformation is applied to help relax the normality assumption. In the graph estimation stage, the graph structure is estimated by Meinshausen-Buhlmann graph estimation or the graphical lasso, and both methods can be further accelerated by the lossy screening rule preselecting the neighborhood of each variable by correlation thresholding. We target on high-dimensional data analysis usually d >> n, and the computation is memory-optimized using the sparse matrix output. We also provide a computationally efficient approach, correlation thresholding graph estimation. Three regularization/thresholding parameter selection methods are included in this package: (1)stability approach for regularization selection (2) rotation information criterion (3) extended Bayesian information criterion which is only available for the graphical lasso.
4063 gRaphical Models in R igraph Network Analysis and Visualization Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
4064 gRaphical Models in R lvnet Latent Variable Network Modeling Estimate, fit and compare Structural Equation Models (SEM) and network models (Gaussian Graphical Models; GGM) using OpenMx. Allows for two possible generalizations to include GGMs in SEM: GGMs can be used between latent variables (latent network modeling; LNM) or between residuals (residual network modeling; RNM). For details, see Epskamp, Rhemtulla and Borsboom (2017) <doi:10.1007/s11336-017-9557-x>.
4065 gRaphical Models in R MXM Feature Selection (Including Multiple Solutions) and Bayesian Networks Many feature selection methods for a wide range of response variables, including minimal, statistically-equivalent and equally-predictive feature subsets. Bayesian network algorithms and related functions are also included. The package name ‘MXM’ stands for “Mens eX Machina”, meaning “Mind from the Machine” in Latin. Reference: Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets, Lagani, V. and Athineou, G. and Farcomeni, A. and Tsagris, M. and Tsamardinos, I. (2017). Journal of Statistical Software, 80(7). <doi:10.18637/jss.v080.i07>.
4066 gRaphical Models in R ndtv Network Dynamic Temporal Visualizations Renders dynamic network data from ‘networkDynamic’ objects as movies, interactive animations, or other representations of changing relational structures and attributes.
4067 gRaphical Models in R network Classes for Relational Data Tools to create and modify network objects. The network class can represent a range of relational data types, and supports arbitrary vertex/edge/graph attributes.
4068 gRaphical Models in R networkDynamic Dynamic Extensions for Network Objects Simple interface routines to facilitate the handling of network objects with complex intertemporal data. This is a part of the “statnet” suite of packages for network analysis.
4069 gRaphical Models in R parcor Regularized estimation of partial correlation matrices The package estimates the matrix of partial correlations based on different regularized regression methods: lasso, adaptive lasso, PLS, and Ridge Regression. In addition, the package provides model selection for lasso, adaptive lasso and Ridge regression based on cross-validation.
4070 gRaphical Models in R pcalg Methods for Graphical Models and Causal Inference Functions for causal structure learning and causal inference using graphical models. The main algorithms for causal structure learning are PC (for observational data without hidden variables), FCI and RFCI (for observational data with hidden variables), and GIES (for a mix of data from observational studies (i.e. observational data) and data from experiments involving interventions (i.e. interventional data) without hidden variables). For causal inference the IDA algorithm, the Generalized Backdoor Criterion (GBC), the Generalized Adjustment Criterion (GAC) and some related functions are implemented. Functions for incorporating background knowledge are provided.
4071 gRaphical Models in R QUIC Regularized sparse inverse covariance matrix estimation Use Newton’s method and coordinate descent to solve the regularized inverse covariance matrix estimation problem. Please refer to: Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation, Cho-Jui Hsieh, Matyas A. Sustik, Inderjit S. Dhillon, Pradeep Ravikumar, Advances in Neural Information Processing Systems 24, 2011, p. 23302338.
4072 gRaphical Models in R R2OpenBUGS Running OpenBUGS from R Using this package, it is possible to call a BUGS model, summarize inferences and convergence in a table and graph, and save the simulations in arrays for easy access in R.
4073 gRaphical Models in R R2WinBUGS Running ‘WinBUGS’ and ‘OpenBUGS’ from ‘R’ / ‘S-PLUS’ Invoke a ‘BUGS’ model in ‘OpenBUGS’ or ‘WinBUGS’, a class “bugs” for ‘BUGS’ results and functions to work with that class. Function write.model() allows a ‘BUGS’ model file to be written. The class and auxiliary functions could be used with other MCMC programs, including ‘JAGS’.
4074 gRaphical Models in R rjags Bayesian Graphical Models using MCMC Interface to the JAGS MCMC library.
4075 gRaphical Models in R SIN A SINful Approach to Selection of Gaussian Graphical Markov Models This package provides routines to perform SIN model selection as described in Drton & Perlman (2004, 2008). The selected models are represented in the format of the ‘ggm’ package, which allows in particular parameter estimation in the selected model.
4076 gRaphical Models in R sparsebn Learning Sparse Bayesian Networks from High-Dimensional Data Fast methods for learning sparse Bayesian networks from high-dimensional data using sparse regularization, as described in Aragam, Gu, and Zhou (2017) <arXiv:1703.04025>. Designed to handle mixed experimental and observational data with thousands of variables with either continuous or discrete observations.