# Planet R

## December 19, 2014

### Dirk Eddelbuettel

#### Rocker is now the official R image for Docker

Something happened a little while ago which we did not have time to commensurate properly. Our Rocker image for R is now the official R image for Docker itself. So getting R (via Docker) is now as simple as saying docker pull r-base.

This particular container is essentially just the standard r-base Debian package for R (which is one of a few I maintain there) plus a mininal set of extras. This r-base forms the basis of our other containers as e.g. the rather popular r-studio container wrapping the excellent RStudio Server.

A lot of work went into this. Carl and I also got a tremendous amount of help from the good folks at Docker. Details are as always at the Rocker repo at GitHub.

Docker itself continues to make great strides, and it has been great fun help to help along. With this post I achieved another goal: blog about Docker with an image not containing shipping containers. Just kidding.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

### CRANberries

#### New package nlsem with initial version 0.1

Package: nlsem
Version: 0.1
Date: 2014-12-19
Title: Fitting Structural Equation Mixture Models
Authors@R: c(person("Nora", "Umbach", role = c("aut", "cre"), email = "nora.umbach@web.de"), person("Katharina", "Naumann", role = "aut"), person("David", "Hoppe", role = "aut"), person("Holger", "Brandt", role = "aut"), person("Augustin", "Kelava", role="ctb"), person("Bernhard", "Schmitz", role="ctb"))
Depends: R (>= 3.1.0), gaussquad, mvtnorm, nlme
Description: Estimation of structural equation models with nonlinear effects and underlying nonnormal distributions.
Packaged: 2014-12-19 09:42:10 UTC; noraumbach
Author: Nora Umbach [aut, cre], Katharina Naumann [aut], David Hoppe [aut], Holger Brandt [aut], Augustin Kelava [ctb], Bernhard Schmitz [ctb]
Maintainer: Nora Umbach
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-19 12:07:02

### Removed CRANberries

#### Package OneArmPhaseTwoStudy (with last version 0.1) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-12-10 0.1

#### Package WhopGenome (with last version 0.9.0) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-09-12 0.9.0
2014-08-27 0.8.9
2013-12-03 0.8.2
2013-11-19 0.8.1

#### Package hddtools (with last version 0.2.2) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-09-24 0.2.2
2014-08-13 0.1.1

### CRANberries

#### New package srd with initial version 1.0

Package: srd
Type: Package
Title: Draws Scaled Rectangle Diagrams
Version: 1.0
Date: 2014-07-15
Author: Roger Marshall
Maintainer: Roger Marshall
Description: Draws scaled rectangle diagrams to represent a 2^k contingency table, for k=6
Imports: plyr,animation,colorspace,stringr,survival
Packaged: 2014-12-18 22:35:36 UTC; rmar073
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-19 01:58:45

#### New package spanr with initial version 1.0

Package: spanr
Type: Package
Title: Search Partition Analysis
Version: 1.0
Date: 2014-07-15
Author: Roger Marshall
Maintainer: Roger Marshall
Description: Carries out a search for an optimal partition in terms of a regular Boolean expression
Imports: plyr, stringr, survival
Packaged: 2014-12-18 22:25:39 UTC; rmar073
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-19 01:49:50

## December 18, 2014

### Bioconductor Project Working Papers

#### Statistical Inference for the Mean Outcome Under a Possibly Non-Unique Optimal Treatment Strategy

We consider challenges that arise in the estimation of the value of an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition needed to develop a regular asymptotically linear (RAL) estimator of this parameter. The stated condition is slightly more general than the previous condition implied in the literature. We then describe an approach to obtain root-n rate confidence intervals for the optimal value even when the parameter is not pathwise differentiable. In particular, we develop an estimator that, when properly standardized, converges to a normal limiting distribution. We provide conditions under which our estimator is RAL and asymptotically efficient when the mean outcome is pathwise differentiable. We outline an extension of our approach to a multiple time point problem in the appendix. All of our results are supported by simulations.

### CRANberries

#### New package GPC with initial version 0.1

Package: GPC
Type: Package
Title: Generalized Polynomial Chaos
Version: 0.1
Depends: R (>= 2.7.0), randtoolbox, orthopolynom, ks, lars
Date: 2013-02-01
Author: Miguel Munoz Zuniga and Jordan Ko
Maintainer: Miguel Munoz Zuniga
Description: A generalized polynomial chaos expansion of a model taking as input independent random variables is achieved. A statistical and a global sensitivity analysis of the model are also carried out.
Packaged: 2014-12-18 17:17:50 UTC; munozzum
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-18 21:08:58

### Bioconductor Project Working Papers

#### Confidence intervals for the treatment effect on the treated

The average effect of the treatment on the treated is a quantity of interest in observational studies in which no definite parameter can be used to quantify the treatment effect, such as those where only a random subset of the data obtained by stratification can be used for analysis. Non-parametric confidence intervals for this quantity appear to be known only in the case where the responses to the treatment are binary and the data fall into a single stratum. We propose nonparametric confidence intervals for the average effect of the treatment on the treated in studies involving one or more strata and general numerical responses.

### CRANberries

#### New package selfingTree with initial version 0.2

Package: selfingTree
Type: Package
Title: Genotype Probabilities in Intermediate Generations of Inbreeding Through Selfing
Version: 0.2
Copyright: (c) 2014, Pioneer Hi-Bred International, Inc.
Date: 2014-12-18
Authors@R: person(given = "Frank",family = "Technow", email = "Frank.Technow@pioneer.com", comment = "Pioneer Hi-Bred International, Inc., Johnston, Iowa", role = c("aut","cre"))
Author: Frank Technow [aut, cre] (Pioneer Hi-Bred International, Inc., Johnston, Iowa)
LazyData: TRUE
Maintainer: Frank Technow
Depends: R (>= 2.15.1),foreach
Description: A probability tree allows to compute probabilities of complex events, such as genotype probabilities in intermediate generations of inbreeding through recurrent self-fertilization (selfing). This package implements functionality to compute probability trees for two- and three-marker genotypes in the F2 to F7 selfing generations. The conditional probabilities are derived automatically and in symbolic form. The package also provides functionality to extract and evaluate the relevant probabilities.
Packaged: 2014-12-18 15:43:52 UTC; frank
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-18 17:31:18

### Bioconductor Project Working Papers

#### A Marginalized Zero-Inflated Negative Binomial Regression Model with Overall Exposure Effects

The zero-inflated negative binomial regression model (ZINB) is often employed in diverse fields such as dentistry, health care utilization, highway safety, and medicine, to examine relationships between exposures of interest and overdispersed count outcomes exhibiting many zeros. The regression coefficients of ZINB have latent class interpretations for a susceptible subpopulation at risk for the disease/condition under study with counts generated from a negative binomial distribution and for a non-susceptible subpopulation that provides only zero counts. The ZINB parameters, however, are not well-suited for estimating overall exposure effects, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. In this paper, a marginalized zero-inflated negative binomial regression (MZINB) model for independent responses is proposed to model the population marginal mean count directly, providing straightforward inference for overall exposure effects based on maximum likelihood estimation. Through simulation studies, the performance of MZINB with respect to test size is compared to marginalized zero-inflated Poisson, Poisson, and negative binomial regression. The MZINB model is applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren.

### CRANberries

#### New package RoughSetKnowledgeReduction with initial version 0.1

Package: RoughSetKnowledgeReduction
Type: Package
Title: Simplification of Decision Tables using Rough Sets
Version: 0.1
Date: 2012-03-13
Author: Alber Sanchez
Maintainer: Alber Sanchez
Description: Rough Sets were introduced by Zdzislaw Pawlak on his book "Rough Sets: Theoretical Aspects of Reasoning About Data". Rough Sets provide a formal method to approximate crisp sets when the set-element belonging relationship is either known or undetermined. This enables the use of Rough Sets for reasoning about incomplete or contradictory knowledge. A decision table is a prescription of the decisions to make given some conditions. Such decision tables can be reduced without losing prescription ability. This package provides the classes and methods for knowledge reduction from decision tables as presented in the chapter 7 of the aforementioned book. This package provides functions for calculating the both the discernibility matrix and the essential parts of decision tables.
Depends: methods
Collate: DecisionTable.R ConditionReduct.R DiscernibilityMatrix.R ValueReduct.R
Packaged: 2014-12-07 14:02:21 UTC; alber
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-18 09:11:32

#### New package multiband with initial version 0.1.0

Package: multiband
Title: Period Estimation for Multiple Bands
Description: Algorithms for performing joint parameter estimation in astronomical survey data acquired in multiple bands.
Version: 0.1.0
Maintainer: Eric C. Chi
Author: Eric C. Chi, James P. Long
Depends: R (>= 3.0.2)
LazyData: true
Packaged: 2014-12-18 05:28:46 UTC; ericchi
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-18 07:12:06

## December 17, 2014

### CRANberries

#### New package RcmdrPlugin.RMTCJags with initial version 1.0

Package: RcmdrPlugin.RMTCJags
Type: Package
Title: R MTC Jags Rcmdr Plugin
Version: 1.0
Date: 2014-12-16
Author: Marcelo Goulart Correia
Maintainer: Marcelo Goulart Correia
Depends: R (>= 3.0.0)
Imports: Rcmdr, runjags, rmeta, igraph, coda, rjags
SystemRequirements: jags (>= 3.0.0)
Description: This package provides an Rcmdr "plug-in" for perform Mixed Treatment Comparison for binary outcome using BUGS code from Bristol University (Lu and Ades)
Packaged: 2014-12-05 16:45:08 UTC; instituto
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-17 23:13:09

#### New package MenuCollection with initial version 1.0

Type: Package
Title: Collection of Configurable GTK+ Menus
Version: 1.0
Date: 2014-07-19
Author: Gianmarco Polotti
Maintainer: Gianmarco Polotti
Description: Set of configurable menus built with GTK+ to graphically interface new functions.
Depends: R (>= 3.0.0), RGtk2, RGtk2Extras
Imports: gplots, grDevices, graphics
Suggests:
Packaged: 2014-12-17 20:46:32 UTC; PolGia0
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-17 23:13:05

### Removed CRANberries

#### Package BEQI2 (with last version 1.0-1) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-06-10 1.0-1
2014-04-16 1.0-0

### CRANberries

#### New package GPareto with initial version 1.0.0

Package: GPareto
Type: Package
Title: Gaussian Processes for Pareto Front Estimation and Optimization
Version: 1.0.0
Date: 2014-12-17
Author: Mickael Binois, Victor Picheny
Maintainer: Mickael Binois
Description: Gaussian process regression models, a.k.a. kriging models, are applied to global multiobjective optimization of black-box functions. Multiobjective Expected Improvement and Stepwise Uncertainty Reduction sequential infill criteria are available. A quantification of uncertainty on Pareto fronts is provided using conditional simulations.
Depends: DiceKriging (>= 1.5.3), emoa, methods
Imports: Rcpp (>= 0.11.1), rgenoud, pbivnorm, pso, randtoolbox, KrigInv, MASS
Suggests: DiceDesign (>= 1.4)
Packaged: 2014-12-17 09:42:42 UTC; a073501
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-17 13:08:28

#### New package stpp with initial version 1.0-5

Package: stpp
Type: Package
Title: Space-Time Point Pattern simulation, visualisation and analysis
Version: 1.0-5
Date: 2014-12-11
Author: Edith Gabriel, Peter J Diggle, stan function by Barry Rowlingson
Maintainer: Edith Gabriel
Depends: R (>= 2.10), splancs, KernSmooth, spatstat
Suggests: rpanel, rgl
Description: A package for analysing, simulating and displaying space-time point patterns
Packaged: 2014-12-16 13:46:14 UTC; math
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-17 09:25:40

#### New package RSNPset with initial version 0.3

Package: RSNPset
Type: Package
Title: Efficient Score Statistics For Genome-Wide SNP Set Analysis
Version: 0.3
Date: 2014-12-16
Author: Chanhee Yi, Alexander Sibley, and Kouros Owzar
Maintainer: Alexander Sibley
Description: RSNPset is an implementation of the use of efficient score statistics in genome-wide SNP set analysis with complex traits. The package provides three standard score statistics (Cox, Binomial, and Gaussian) but is easily extensible to include others. Code implementing the inferential procedure is primarily written in C++ and utilizes parallelization of the analysis to reduce runtime. A supporting function offers simple computation of observed, permutation, and FWER and FDR adjusted p-values.
Imports: fastmatch (>= 1.0-4), foreach (>= 1.4.1), doRNG (>= 1.5.3), qvalue (>= 1.34), Rcpp (>= 0.10.4)
Suggests: knitr
VignetteBuilder: knitr
Packaged: 2014-12-16 19:06:52 UTC; abs33
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-17 01:59:36

#### New package refset with initial version 0.1.0

Package: refset
Type: Package
Title: Subsets with Reference Semantics
Version: 0.1.0
Date: 2014-11-20
Author: David Hugh-Jones
Maintainer: David Hugh-Jones
Description: Provides subsets with reference semantics, i.e. subsets which automatically reflect changes in the original object, and which optionally update the original object when they are changed.
NeedsCompilation: no
VignetteBuilder: knitr
Suggests: knitr, rmarkdown, testthat
Packaged: 2014-12-16 21:42:46 UTC; david
Repository: CRAN
Date/Publication: 2014-12-17 01:57:34

#### New package FPDclustering with initial version 1.0

Package: FPDclustering
Type: Package
Title: PD-Clustering and Factor PD-Clustering
Version: 1.0
Date: 2014-12-16
Author: Cristina Tortora and Paul D. McNicholas
Maintainer: Cristina Tortora
Description: Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PD-clustering is a flexible method that can be used with non-spherical clusters, outliers, or noisy data. Facto PD-clustering (FPDC) is a recently proposed factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It allows clustering of high dimensional data sets.
Depends: ThreeWay
Packaged: 2014-12-16 18:46:17 UTC; ctortora
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-17 01:54:43

## December 16, 2014

### CRANberries

#### New package rSPACE with initial version 1.0

Package: rSPACE
Type: Package
Title: Spatially-Explicit Power Analysis for Conservation and Ecology
Version: 1.0
Date: 2014-12-15
Author: Martha Ellis, Jake Ivan, Jody Tucker, Mike Schwartz
Maintainer: Martha Ellis
Description: Conducts a spatially-explicit, simulation-based power analysis for detecting trends in population abundance through occupancy-based modeling. Applicable for evaluating monitoring designs in conservation and ecological settings.
Imports: raster, RMark, ggplot2, tcltk2, sp, grid, plyr, tcltk
Packaged: 2014-12-16 18:20:03 UTC; marthamellis
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-16 22:38:08

#### New package PRROC with initial version 1.0

Package: PRROC
Type: Package
Title: Precision-Recall and ROC Curves for Weighted and Unweighted Data
Version: 1.0
Date: 2014-12-16
Author: Jan Grau and Jens Keilwagen
Maintainer: Jan Grau
Description: Computes the areas under the precision-recall (PR) and ROC curve for weighted (e.g., soft-labeled) and unweighted data. In contrast to other implementations, the interpolation between points of the PR curve is done by a non-linear piecewise function. In addition to the areas under the curves, the curves themselves can also be computed and plotted by a specific S3-method.
Packaged: 2014-12-16 21:03:40 UTC; dev
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-16 22:34:59

#### New package NeuralNetTools with initial version 1.0.0

Package: NeuralNetTools
Type: Package
Title: Visualization and Analysis Tools for Neural Networks
Version: 1.0.0
Date: 2014-12-10
Author: Marcus W. Beck [aut, cre]
Maintainer: Marcus W. Beck
Description: Visualization and analysis tools to aid in the interpretation of neural network models. Functions are available for plotting, quantifying variable importance, conducting a sensitivity analysis, and obtaining a simple list of model weights.
BugReports: https://github.com/fawda123/NeuralNetTools/issues
LazyData: true
Imports: ggplot2, neuralnet, nnet, reshape2, RSNNS, scales
Depends: R (>= 3.1.1)
Authors@R: person(given = "Marcus W.", family = "Beck", role = c("aut","cre"), email = "mbafs2012@gmail.com")
Packaged: 2014-12-16 16:06:56 UTC; Marcus
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-16 17:43:30

#### New package spca with initial version 0.6.0

Package: spca
Type: Package
Title: Sparse Principal Component Analysis
Version: 0.6.0
Date: 2014-12-14
Author: Giovanni Maria Merola
Maintainer: Giovanni Merola
URL: http://github.com/merolagio/spca
BugReports: http://github.com/merolagio/spca/issues
Description: Computes Least Squares Sparse Principal Components either by a Branch-and-Bound search or with an iterative Backward Elimination algorithm. Sparse solutions can be plotted, printed and compared using the methods included.
Depends: R (>= 3.1)
VignetteBuilder: knitr
Suggests: formatR, knitr (>= 1.8.0), ggplot2, reshape2
Imports: MASS
LazyData: true
Packaged: 2014-12-16 11:39:18 UTC; Glovani
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-16 13:19:52

#### New package WaveletComp with initial version 1.0

Package: WaveletComp
Type: Package
Title: Computational Wavelet Analysis
Version: 1.0
Date: 2014-12-15
Author: Angi Roesch and Harald Schmidbauer
Maintainer: Angi Roesch
Description: Wavelet analysis and reconstruction of time series, cross-wavelets and phase-difference (with filtering options), significance with simulation algorithms.
Depends: R (>= 2.10)
URL: http://www.hs-stat.com/projects/WaveletComp/WaveletComp_guided_tour.pdf
Packaged: 2014-12-16 07:19:23 UTC; angi
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-16 09:26:23

#### New package networkD3 with initial version 0.1.1

Package: networkD3
Type: Package
Title: Tools for Creating D3 JavaScript Network Graphs from R
Description: Creates D3 JavaScript network, tree, dendrogram, and Sankey graphs from R.
Version: 0.1.1
Date: 2014-12-16
Author: Christopher Gandrud, J.J. Allaire, and B.W. Lewis
Maintainer: Christopher Gandrud
URL: http://github.com/christophergandrud/networkD3/
Depends: R (>= 3.0.0)
Imports: htmlwidgets (>= 0.3.2), plyr, rjson
Suggests: htmltools (>= 0.2.6), RCurl
Enhances: knitr, shiny
Packaged: 2014-12-16 06:52:00 UTC; christophergandrud
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-16 08:22:39

#### New package BinNonNor with initial version 1.0

Package: BinNonNor
Type: Package
Title: Data Generation with Binary and Continuous Non-normal Components
Version: 1.0
Date: 2014-12-15
Author: Gul Inan, Hakan Demirtas
Maintainer: Gul Inan
Description: Generation of multiple binary and continuous non-normal variables simultaneously given the marginal characteristics and association structure based on the methodology proposed by Demirtas et al. (2012).
Depends: BB, corpcor, mvtnorm, Matrix
Packaged: 2014-12-15 19:20:02 UTC; arcelik
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-16 08:17:47

## December 15, 2014

### CRANberries

#### New package RcppDL with initial version 0.0.3

Package: RcppDL
Type: Package
Title: Deep Learning Methods via Rcpp
Version: 0.0.3
Date: 2014-12-12
Author: Qiang Kou, Yusuke Sugomori
Maintainer: Qiang Kou
Description: This package is based on the C++ code from Yusuke Sugomori, which implements basic machine learning methods with many layers (deep learning), including dA (Denoising Autoencoder), SdA (Stacked Denoising Autoencoder), RBM (Restricted Boltzmann machine) and DBN (Deep Belief Nets).
Imports: methods, Rcpp (>= 0.11.2)
URL: https://github.com/thirdwing/RcppDL
BugReports: https://github.com/thirdwing/RcppDL/issues
Packaged: 2014-12-15 20:35:49 UTC; qkou
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-16 01:02:30

#### New package leaderCluster with initial version 1.2

Type: Package
Version: 1.2
Date: 2014-12-11
Author: Taylor B. Arnold
Maintainer: Taylor B. Arnold
Description: The leader clustering algorithm provides a means for clustering a set of data points. Unlike many other clustering algorithms it does not require the user to specify the number of clusters, but instead requires the approximate radius of a cluster as its primary tuning parameter. The package provides a fast implementation of this algorithm in n-dimensions using Lp-distances (with special cases for p=1,2, and infinity) as well as for spatial data using the Haversine formula, which takes latitude/longitude pairs as inputs and clusters based on great circle distances.
Packaged: 2014-12-15 15:30:02 UTC; taylor
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-12-16 00:58:11

### Bioconductor Project Working Papers

#### OPTIMAL, TWO STAGE, ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS USING SPARSE LINEAR PROGRAMMING

Adaptive enrichment designs involve preplanned rules for modifying enrollment criteria based on accruing data in a randomized trial. Such designs have been proposed, for example, when the population of interest consists of biomarker positive and biomarker negative individuals. The goal is to learn which populations benefit from an experimental treatment. Two critical components of adaptive enrichment designs are the decision rule for modifying enrollment, and the multiple testing procedure. We provide the first general method for simultaneously optimizing both of these components for two stage, adaptive enrichment designs. We minimize expected sample size under constraints on power and the familywise Type I error rate. It is computationally infeasible to directly solve this optimization problem since it is not convex. The key to our approach is a novel representation of a discretized version of this optimization problem as a sparse linear program. We apply advanced optimization methods to solve this problem to high accuracy, revealing new, approximately optimal designs.

### CRANberries

#### New package ztable with initial version 0.1.0

Package: ztable
Title: Zebra-Striped Tables in LaTeX and HTML Formats
Version: 0.1.0
Authors@R: "Keon-Woong Moon [aut, cre]"
Description: Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm or coxph objects.
Depends: R (>= 3.1.2)
LazyData: true
Suggests: MASS, survival, testthat, knitr
VignetteBuilder: knitr
Packaged: 2014-12-15 16:29:35 UTC; cardiomoon
Author: "Keon-Woong Moon" [aut, cre]
Maintainer: "Keon-Woong Moon"
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-15 20:04:46

### Removed CRANberries

#### Package CCMnet (with last version 0.0-2) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-02-14 0.0-2

#### QQ-plot in SPSS using Blom's method

We teach two software packages, R and SPSS, in Quantitative Methods 101 for psychology freshman at Bremen University (Germany). Sometimes confusion arises, when the software packages produce different results. This may be due to specifics in the implemention of a method or, as in most cases, to different default settings. One of these situations occurs when the QQ-plot is introduced. Below we see two QQ-plots, produced by SPSS and R, respectively. The data used in the plots were generated by:

set.seed(0)
x <- sample(0:9, 100, rep=T)


SPSS

R

qqnorm(x, datax=T)      # uses Blom's method by default
qqline(x, datax=T)


There are some obvious differences:

1. The most obvious one is that the R plot seems to contain more data points than the SPSS plot. Actually, this is not the case. Some data points are plotted on top of each in SPSS while they are spread out vertically in the R plot. The reason for this difference is that SPSS uses a different approach assigning probabilities to the values. We will expore the two approaches below.
2. The scaling of the y-axis differs. R uses quantiles from the standard normal distribution. SPSS by default rescales these values using the mean and standard deviation from the original data. This allows to directly compare the original and theoretical values. This is a simple linear transformation and will not be explained any further here.
3. The QQ-lines are not identical. R uses the 1st and 3rd quartile from both distributions to draw the line. This is different in SPSS where of a line is drawn for identical values on both axes. We will expore the differences below.

# QQ-plots from scratch

To get a better understanding of the difference we will build the R and SPSS-flavored QQ-plot from scratch.

## R type

In order to calculate theoretical quantiles corresponding to the observed values, we first need to find a way to assign a probability to each value of the original data. A lot of different approaches exist for this purpose (for an overview see e.g. Castillo-Gutiérrez, Lozano-Aguilera, & Estudillo-Martínez, 2012b). They usually build on the ranks of the observed data points to calculate corresponding p-values, i.e. the plotting positions for each point. The qqnorm function uses two formulae for this purpose, depending on the number of observations $n$ (Blom’s mfethod, see ?qqnorm; Blom, 1958). With $r$ being the rank, for $n > 10$ 10" /> 10" title="n > 10" class="latex" /> it will use the formula $p = (r - 1/2) / n$, for $n \leq 10$ the formula $p = (r - 3/8) / (n + 1/4)$ to determine the probability value $p$ for each observation (see the help files for the functions qqnorm and ppoint). For simplicity reasons, we will only implement the $n > 10$ 10" /> 10" title="n > 10" class="latex" /> case here.

n <- length(x)          # number of observations
r <- order(order(x))    # order of values, i.e. ranks without averaged ties
p <- (r - 1/2) / n      # assign to ranks using Blom's method
y <- qnorm(p)           # theoretical standard normal quantiles for p values
plot(x, y)              # plot empirical against theoretical values


Before we take at look at the code, note that our plot is identical to the plot generated by qqnorm above, except that the QQ-line is missing. The main point that makes the difference between R and SPSS is found in the command order(order(x)). The command calculates ranks for the observations using ordinal ranking. This means that all observations get different ranks and no average ranks are calculated for ties, i.e. for observations with equal values. Another approach would be to apply fractional ranking and calculate average values for ties. This is what the function rank does. The following codes shows the difference between the two approaches to assign ranks.

v <- c(1,1,2,3,3)
order(order(v))     # ordinal ranking used by R

## [1] 1 2 3 4 5

rank(v)             # fractional ranking used by SPSS

## [1] 1.5 1.5 3.0 4.5 4.5


R uses ordinal ranking and SPSS uses fractional ranking by default to assign ranks to values. Thus, the positions do not overlap in R as each ordered observation is assigned a different rank and therefore a different p-value. We will pick up the second approach again later, when we reproduce the SPSS-flavored plot in R.1

The second difference between the plots concerned the scaling of the y-axis and was already clarified above.

The last point to understand is how the QQ-line is drawn in R. Looking at the probs argument of qqline reveals that it uses the 1st and 3rd quartile of the original data and theoretical distribution to determine the reference points for the line. We will draw the line between the quartiles in red and overlay it with the line produced by qqline to see if our code is correct.

plot(x, y)                      # plot empirical against theoretical values
ps <- c(.25, .75)               # reference probabilities
a <- quantile(x, ps)            # empirical quantiles
b <- qnorm(ps)                  # theoretical quantiles
lines(a, b, lwd=4, col="red")   # our QQ line in red
qqline(x, datax=T)              # R QQ line


The reason for different lines in R and SPSS is that several approaches to fitting a straight line exist (for an overview see e.g. Castillo-Gutiérrez, Lozano-Aguilera, & Estudillo-Martínez, 2012a). Each approach has different advantages. The method used by R is more robust when we expect values to diverge from normality in the tails, and we are primarily interested in the normality of the middle range of our data. In other words, the method of fitting an adequate QQ-line depends on the purpose of the plot. An explanation of the rationale of the R approach can e.g. be found here.

## SPSS type

The default SPSS approach also uses Blom’s method to assign probabilities to ranks (you may choose other methods is SPSS) and differs from the one above in the following aspects:

• a) As already mentioned, SPSS uses ranks with averaged ties (fractional rankings) not the plain order ranks (ordinal ranking) as in R to derive the corresponding probabilities for each data point. The rest of the code is identical to the one above, though I am not sure if SPSS distinguishes between the $n 10$ case.
• b) The theoretical quantiles are scaled to match the estimated mean and standard deviation of the original data.
• c) The QQ-line goes through all quantiles with identical values on the x and y axis.
n <- length(x)                # number of observations
r <- rank(x)                  # a) ranks using fractional ranking (averaging ties)
p <- (r - 1/2) / n            # assign to ranks using Blom's method
y <- qnorm(p)                 # theoretical standard normal quantiles for p values
y <- y * sd(x) + mean(x)      # b) transform SND quantiles to mean and sd from original data
plot(x, y)                    # plot empirical against theoretical values


Lastly, let us add the line. As the scaling of both axes is the same, the line goes through the origin with a slope of $1$.

abline(0,1)                   # c) slope 0 through origin


The comparison to the SPSS output shows that they are (visually) identical.

# Function for SPSS-type QQ-plot

The whole point of this demonstration was to pinpoint and explain the differences between a QQ-plot generated in R and SPSS, so it will no longer be a reason for confusion. Note, however, that SPSS offers a whole range of options to generate the plot. For example, you can select the method to assign probabilities to ranks and decide how to treat ties. The plots above used the default setting (Blom’s method and averaging across ties). Personally I like the SPSS version. That is why I implemented the function qqnorm_spss in the ryouready package, that accompanies the course. The formulae for the different methods to assign probabilities to ranks can be found in Castillo-Gutiérrez et al. (2012b). The implentation is a preliminary version that has not yet been thoroughly tested. You can find the code here. Please report any bugs or suggestions for improvements (which are very welcome) in the github issues section.

library(devtools)
install_github("markheckmann/ryouready")                # install from github repo
library(ggplot2)
qq <- qqnorm_spss(x, method=1, ties.method="average")   # Blom's method with averaged ties
plot(qq)                                                # generate QQ-plot
ggplot(qq)                                              # use ggplot2 to generate QQ-plot


# Literature

1. Technical sidenote: Internally, qqnorm uses the function ppoints to generate the p-values. Type in stats:::qqnorm.default to the console to have a look at the code.

## December 14, 2014

### CRANberries

#### New package qqtest with initial version 1.0

Package: qqtest
Type: Package
Title: Quantile Quantile Plots Self Calibrating For Visual Testing
Version: 1.0
Date: 2014-12-02
Authors@R: person("Wayne", "Oldford", email="rwoldford@uwaterloo.ca", role=c("aut", "cre"))
Maintainer: Wayne Oldford
Description: Provides the function qqtest which incorporates uncertainty in its qqplot display(s) so that the user might have a better sense of the evidence against the specified distributional hypothesis. qqtest draws a quantile quantile plot for visually assessing whether the data come from a test distribution that has been defined in one of many ways. The vertical axis plots the data quantiles, the horizontal those of a test distribution. The default behaviour generates 1000 samples from the test distribution and overlays the plot with pointwise interval estimates for the ordered quantiles from the test distribution. A small number of independently generated exemplar quantile plots are also overlaid. Both the interval estimates and the exemplars provide different comparative information to assess the evidence provided by the qqplot for or against the hypothesis that the data come from the test distribution (default is normal or gaussian). Finally, a visual test of significance (a lineup plot) can also be displayed to test the null hypothesis that the data come from the test distribution.
Depends: R (>= 2.10.0)
Imports: grDevices, MASS
Packaged: 2014-12-13 19:36:06 UTC; rwoldford
Author: Wayne Oldford [aut, cre]
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2014-12-14 01:15:35

### Dirk Eddelbuettel

#### rfoaas 0.0.4.20141212

A new version of rfoaas is now on CRAN. The rfoaas package provides an interface for R to the most excellent FOAAS service -- which provides a modern, scalable and RESTful web service for the frequent need to tell someone to eff off.

The FOAAS backend gets updated in spurts, and yesterday a few pull requests were integrated, including one from yours truly. So with that it was time for an update to rfoaas. As the version number upstream did not change (bad, bad, practice) I appended the date the version number.

CRANberries also provides a diff to the previous release. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

## December 13, 2014

### CRANberries

#### New package rodd with initial version 0.1-1

Package: rodd
Type: Package
Title: Optimal Discriminating Designs
Version: 0.1-1
Date: 2014-10-28
Authors@R: c(person("Roman", "Guchenko", role = c("aut", "cre"), email = "RomanGuchenko@yandex.ru"))
Depends: R (>= 3.0.0)
Imports: numDeriv, quadprog, Matrix, rootSolve, matrixcalc
Suggests: mvtnorm
Description: A collection of functions for numerical construction of optimal discriminating designs. At the current moment T-optimal designs (which maximize the lower bound for the power of F-test for regression model discrimination) and their robust analogues can be calculated with the package.
NeedsCompilation: no
Packaged: 2014-12-13 19:04:46 UTC; Roman
Author: Roman Guchenko [aut, cre]
Maintainer: Roman Guchenko
Repository: CRAN
Date/Publication: 2014-12-14 01:05:16

### CRANberries

#### New package mixor with initial version 1.0.1

Package: mixor
Type: Package
Title: Mixed-Effects Ordinal Regression Analysis
Version: 1.0.1
Date: 2014-12-12
Author: Donald Hedeker , Kellie J. Archer, Rachel Nordgren, Robert D. Gibbons
Maintainer: Kellie J. Archer
Description: Provides the function 'mixord' for fitting a mixed-effects ordinal and binary response models and associated methods for printing, summarizing, extracting estimated coefficients and variance-covariance matrix, and estimating contrasts for the fitted models.
Depends: R (>= 2.10)
Suggests: survival
BuildResaveData: best
Biarch: yes
NeedsCompilation: yes
Packaged: 2014-11-12 00:26:45 UTC; kjarcher
Repository: CRAN
Date/Publication: 2014-12-13 08:02:46

### Removed CRANberries

#### Package MVPARTwrap (with last version 0.1-9.2) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2013-09-22 0.1-9.2
2013-09-08 0.1-9.1
2012-05-02 0.1-9
2011-11-17 0.1-8
2011-11-16 0.1-7

#### Package KsPlot (with last version 1.3) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2011-11-24 1.3
2011-09-20 1.2
2011-05-03 1.1
2011-04-03 1.0

#### Package mvpart (with last version 1.6-2) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-06-24 1.6-2
2013-04-19 1.6-1
2012-02-19 1.6-0
2012-01-07 1.5-0
2011-03-11 1.4-0
2010-03-01 1.3-1
2010-02-06 1.3-0
2007-10-12 1.2-6
2007-09-30 1.2-5
2006-09-29 1.2-4