Planet R

January 24, 2015

Dirk Eddelbuettel

Rcpp 0.11.4

A new release 0.11.4 of Rcpp is now on the CRAN network for GNU R, and an updated Debian package will be uploaded in due course.

Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 323 packages on CRAN depend on Rcpp for making analyses go faster and further; BioConductor adds another 41 packages, and casual searches on GitHub suggests dozens mores.

This release once again adds a large number of small bug fixes, polishes and enhancements. And like the last time, these changes were made by a group of seven different contributors (counting code commits) plus three more providing concrete suggestions. This shows that the Rcpp development and maintenance rests a large number of (broad) shoulders.

See below for a detailed list of changes extracted from the NEWS file.

Changes in Rcpp version 0.11.4 (2015-01-20)

  • Changes in Rcpp API:

    • The ListOf<T> class gains the .attr and .names methods common to other Rcpp vectors.

    • The [dpq]nbinom_mu() scalar functions are now available via the R:: namespace when R 3.1.2 or newer is used.

    • Add an additional test for AIX before attempting to include execinfo.h.

    • Rcpp::stop now supports improved printf-like syntax using the small tinyformat header-only library (following a similar implementation in Rcpp11)

    • Pairlist objects are now protected via an additional Shield<> as suggested by Martin Morgan on the rcpp-devel list.

    • Sorting is now prohibited at compile time for objects of type List, RawVector and ExpressionVector.

    • Vectors now have a Vector::const_iterator that is 'const correct' thanks to fix by Romain following a bug report in rcpp-devel by Martyn Plummer.

    • The mean() sugar function now uses a more robust two-pass method, and new unit tests for mean() were added at the same time.

    • The mean() and var() functions now support all core vector types.

    • The setequal() sugar function has been corrected via suggestion by Qiang Kou following a bug report by Søren Højsgaard.

    • The macros major, minor, and makedev no longer leak in from the (Linux) system header sys/sysmacros.h.

    • The push_front() string function was corrected.

  • Changes in Rcpp Attributes:

    • Only look for plugins in the package's namespace (rather than entire search path).

    • Also scan header files for definitions of functions to be considerd by Attributes.

    • Correct the regular expression for source files which are scanned.

  • Changes in Rcpp unit tests

    • Added a new binary test which will load a pre-built package to ensure that the Application Binary Interface (ABI) did not change; this test will (mostly or) only run at Travis where we have reasonable control over the platform running the test and can provide a binary.

    • New unit tests for sugar functions mean, setequal and var were added as noted above.

  • Changes in Rcpp Examples:

    • For the (old) examples ConvolveBenchmarks and OpenMP, the respective Makefile was renamed to GNUmakefile to please R CMD check as well as the CRAN Maintainers.

Thanks to CRANberries, you can also look at a diff to the previous release As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

January 24, 2015 03:44 PM

RcppGSL 0.2.4

A new version of RcppGSL is now on CRAN. This package provides an interface from R to the GNU GSL using our Rcpp package.

This follows on the heels on the recent RcppGSL 0.2.3 release and extends the excellent point made by Qiang Kou in a contributed section of the vignette: We now not only allow to turn the GSL error handler off (to not abort() on error) but do so on package initialisation.

No other user-facing changes were made.

The NEWS file entries follows below:

Changes in version 0.2.4 (2015-01-24)

  • Two new helper function to turn the default GSL error handler off (and to restore it) were added. The default handler is now turned off when the package is attached so that GSL will no longer abort an R session on error. Users will have to check the error code.

  • The RcppGSL-intro.Rnw vignette was expanded with a short section on the GSL error handler (thanks to Qiang Kou).

Courtesy of CRANberries, a summary of changes to the most recent release is available.

More information is on the RcppGSL page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

January 24, 2015 03:20 PM

RcppAnnoy 0.0.5

A new version of RcppAnnoy is now on CRAN. RcppAnnoy wraps the small, fast, and lightweight C++ template header library Annoy written by Erik Bernhardsson for use at Spotify. RcppAnnoy uses Rcpp Modules to offer the exact same functionality as the Python module wrapped around Annoy.

This version contains a trivial one-character change requested by CRAN to cleanse the Makevars file of possible GNU Make-isms. Oh well. This release also overcomes an undefined behaviour sanitizer bug noticed by CRAN that took somewhat more effort to deal with. As mentioned recently in another blog post, it took some work to create a proper Docker container with the required compiler and subsequent R setup, but we have one now, and the aforementioned blog post has details on how we replicated the CRAN finding of an UBSAN issue. It also took Erik some extra efforts to set something up for his C++/Python side, but eventually an EC2 instance with Ubuntu 14.10 did the task as my Docker sales skills are seemingly not convincing enough. In any event, he very quickly added the right fix, and I synced RcppAnnoy with his Annoy code.

Courtesy of CRANberries, there is also a diffstat report for this release. More detailed information is on the RcppAnnoy page page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

January 24, 2015 02:22 PM

CRANberries

New package knncat with initial version 1.2.1

Package: knncat
Version: 1.2.1
Date: 2015-01-22
Title: Nearest-neighbor Classification with Categorical Variables
Author: Sam Buttrey
Maintainer: Sam Buttrey
Description: Scale categorical variables in such a way as to make NN classification as accurate as possible. The code also handles continuous variables and prior probabilities, and does intelligent variable selection and estimation of both error rates and the right number of NN's.
License: GPL-2
Packaged: 2015-01-23 20:17:52 UTC; sebuttre
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2015-01-24 06:41:56

More information about knncat at CRAN

January 24, 2015 05:13 AM

January 23, 2015

CRANberries

New package LDAvis with initial version 0.2

Package: LDAvis
Title: Interactive Visualization of Topic Models
Version: 0.2
Authors@R: c(person("Carson", "Sievert", role = c("aut", "cre"), email = "cpsievert1@gmail.com"), person("Kenny", "Shirley", role = "aut", email = "kshirley@research.att.com"))
Description: Tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with D3.js that is accessed via a browser. The goal is to help users interpret the topics in their LDA topic model.
Depends: R (>= 2.10)
Imports: proxy, RJSONIO, parallel
License: MIT + file LICENSE
Suggests: mallet, lda, topicmodels, gistr (>= 0.0.8.99), servr, shiny, knitr, rmarkdown
LazyData: true
VignetteBuilder: knitr
URL: https://github.com/cpsievert/LDAvis
BugReports: https://github.com/cpsievert/LDAvis/issues
Packaged: 2015-01-23 16:58:48 UTC; cpsievert
Author: Carson Sievert [aut, cre], Kenny Shirley [aut]
Maintainer: Carson Sievert
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-23 21:07:17

More information about LDAvis at CRAN

January 23, 2015 09:13 PM

New package SpatPCA with initial version 1.0

Package: SpatPCA
Type: Package
Title: Regularized Principal Component Analysis for Spatial Data
Version: 1.0
Date: 2015-01-11
Author: Wen-Ting Wang and Hsin-Cheng Huang
Maintainer: Wen-Ting Wang
Description: This package provides regularized principal component analysis incorporating smoothness, sparseness and orthogonality of eigenfunctions by using alternating direction method of multipliers (ADMM) algorithm.
License: GPL-2
Depends: fields
Imports: Rcpp (>= 0.11.2)
Suggests: foreach
LinkingTo: Rcpp, RcppArmadillo
NeedsCompilation: yes
Packaged: 2015-01-23 02:59:16 UTC; Joseph
Repository: CRAN
Date/Publication: 2015-01-23 09:06:28

More information about SpatPCA at CRAN

January 23, 2015 11:13 AM

New package remMap with initial version 0.2-0

Package: remMap
Version: 0.2-0
Date: 2008-12-3
Title: Regularized Multivariate Regression for Identifying Master Predictors
Author: Jie Peng , Pei Wang , Ji Zhu .
Maintainer: Pei Wang
Description: remMap is developed for fitting multivariate response regression models under the high-dimension-low-sample-size setting
License: GPL (>= 2)
Packaged: 2015-01-22 19:04:20 UTC; xwan2
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2015-01-23 09:17:05

More information about remMap at CRAN

January 23, 2015 11:13 AM

Alstatr

R: Principal Component Analysis on Imaging

Ever wonder what's the mathematics behind face recognition on most gadgets like digital camera and smartphones? Well for most part it has something to do with statistics. One statistical tool that is capable of doing such feature is the Principal Component Analysis (PCA). In this post, however, we will not do (sorry to disappoint you) face recognition as we reserve this for future post while I'm still doing research on it. Instead, we go through its basic concept and use it for data reduction on spectral bands of the image using R.

Let's view it mathematically

Consider a line $L$ in a parametric form described as a set of all vectors $k\cdot\mathbf{u}+\mathbf{v}$ parameterized by $k\in \mathbb{R}$, where $\mathbf{v}$ is a vector orthogonal to a normalized vector $\mathbf{u}$. Below is the graphical equivalent of the statement:
So if given a point $\mathbf{x}=[x_1,x_2]^T$, the orthogonal projection of this point on the line $L$ is given by $(\mathbf{u}^T\mathbf{x})\mathbf{u}+\mathbf{v}$. Graphically, we mean

$Proj$ is the projection of the point $\mathbf{x}$ on the line, where the position of it is defined by the scalar $\mathbf{u}^{T}\mathbf{x}$. Therefore, if we consider $\mathbf{X}=[X_1, X_2]^T$ be a random vector, then the random variable $Y=\mathbf{u}^T\mathbf{X}$ describes the variability of the data on the direction of the normalized vector $\mathbf{u}$. So that $Y$ is a linear combination of $X_i, i=1,2$. The principal component analysis identifies a linear combinations of the original variables $\mathbf{X}$ that contain most of the information, in the sense of variability, contained in the data. The general assumption is that useful information is proportional to the variability. PCA is used for data dimensionality reduction and for interpretation of data. (Ref 1. Bajorski, 2012)

To better understand this, consider two dimensional data set, below is the plot of it along with two lines ($L_1$ and $L_2$) that are orthogonal to each other:
If we project the points orthogonally to both lines we have,

So that if normalized vector $\mathbf{u}_1$ defines the direction of $L_1$, then the variability of the points on $L_1$ is described by the random variable $Y_1=\mathbf{u}_1^T\mathbf{X}$. Also if $\mathbf{u}_2$ is a normalized vector that defines the direction of $L_2$, then the variability of the points on this line is described by the random variable $Y_2=\mathbf{u}_2^T\mathbf{X}$. The first principal component is one with maximum variability. So in this case, we can see that $Y_2$ is more variable than $Y_1$, since the points projected on $L_2$ are more dispersed than in $L_1$. In practice, however, the linear combinations $Y_i = \mathbf{u}_i^T\mathbf{X}, i=1,2,\cdots,p$ is maximized sequentially so that $Y_1$ is the linear combination of the first principal component, $Y_2$ is the linear combination of the second principal component, and so on. Further, the estimate of the direction vector $\mathbf{u}$ is simply the normalized eigenvector $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$. And the variability explained by the principal component is the corresponding eigenvalue $\lambda$. For more details on theory of PCA refer to (Bajorski, 2012) at Reference 1 below.

As promised we will do dimensionality reduction using PCA. We will use the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data from (Barjorski, 2012), you can use other locations of AVIRIS data that can be downloaded here. However, since for most cases the AVIRIS data contains thousands of bands so for simplicity we will stick with the data given in (Bajorski, 2012) as it was cleaned reducing to 152 bands only.

What is spectral bands?

In imaging, spectral bands refer to the third dimension of the image usually denoted as $\lambda$. For example, RGB image contains red, green and blue bands as shown below along with the first two dimensions $x$ and $y$ that define the resolution of the image.

These are few of the bands that are visible to our eyes, there are other bands that are not visible to us like infrared, and many other in electromagnetic spectrum. That is why in most cases AVIRIS data contains huge number of bands each captures different characteristics of the image. Below is the proper description of the data.

Data

The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), is a sensor collecting spectral radiance in the range of wavelengths from 400 to 2500 nm. It has been flown on various aircraft platforms, and many images of the Earth’s surface are available. A 100 by 100 pixel AVIRIS image of an urban area in Rochester, NY, near the Lake Ontario shoreline is shown below. The scene has a wide range of natural and man-made material including a mixture of commercial/warehouse and residential neighborhoods, which adds a wide range of spectral diversity. Prior to processing, invalid bands (due to atmospheric water absorption) were removed, reducing the overall dimensionality to 152 bands. This image has been used in Bajorski et al. (2004) and Bajorski (2011a, 2011b). The first 152 values in the AVIRIS Data represent the spectral radiance values (a spectral curve) for the top left pixel. This is followed by spectral curves of the pixels in the first row, followed by the next row, and so on. (Ref. 1 Bajorski, 2012)

To load the data, run the following code:

Above code uses EBImage package, and can be installed from my previous post.

Why do we need to reduce the dimension of the data?

Before we jump in to our analysis, in case you may ask why? Well sometimes it's just difficult to do analysis on high dimensional data, especially on interpreting it. This is because there are dimensions that aren't significant (like redundancy) which adds to our problem on the analysis. So in order to deal with this, we remove those nuisance dimension and deal with the significant one.

To perform PCA in R, we use the function princomp as seen below:

The structure of princomp consist of a list shown above, we will give description to selected outputs. Others can be found in the documentation of the function by executing ?princomp.
  • sdev - standard deviation, the square root of the eigenvalues $\lambda$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data, dat.mat;
  • loadings - eigenvectors $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data, dat.mat;
  • scores - the principal component scores.
Recall that the objective of PCA is to find for a linear combination $Y=\mathbf{u}^T\mathbf{X}$ that will maximize the variance $Var(Y)$. So that from the output, the estimate of the components of $\mathbf{u}$ is the entries of the loadings which is a matrix of eigenvectors, where the columns corresponds to the eigenvectors of the sequence of principal components, that is if the first principal component is given by $Y_1=\mathbf{u}_1^T\mathbf{X}$, then the estimate of $\mathbf{u}_1$ which is $\mathbf{e}_1$ (eigenvector) is the set of coefficients obtained from the first column of the loadings. The explained variability of the first principal component is the square of the first standard deviation sdev, the explained variability of the second principal component is the square of the second standard deviation sdev, and so on. Now let's interpret the loadings (coefficients) of the first three principal components. Below is the plot of this,
Base above, the coefficients of the first principal component (PC1) are almost all negative. A closer look, the variability in this principal component is mainly explained by the weighted average of radiance of the spectral bands 35 to 100. Analogously, PC2 mainly represents the variability of the weighted average of radiance of spectral bands 1 to 34. And further, the fluctuation of the coefficients of PC3 makes it difficult to tell on which bands greatly contribute on its variability. Aside from examining the loadings, another way to see the impact of the PCs is through the impact plot where the impact curve $\sqrt{\lambda_j}\mathbf{e}_j$ are plotted, I want you to explore that.

Moving on, let's investigate the percent of variability in $X_i$ explained by the $j$th principal component, below is the formula of this, \begin{equation}\nonumber \frac{\lambda_j\cdot e_{ij}^2}{s_{ii}}, \end{equation} where $s_{ii}$ is the estimated variance of $X_i$. So that below is the percent of explained variability in $X_i$ of the first three principal components including the cumulative percent variability (sum of PC1, PC2, and PC3),
For the variability of the first 33 bands, PC2 takes on about 90 percent of the explained variability as seen in the above plot. And still have great contribution further to 102 to 152 bands. On the other hand, from bands 37 to 100, PC1 explains almost all the variability with PC2 and PC3 explain 0 to 1 percent only. The sum of the percentage of explained variability of these principal components is indicated as orange line in the above plot, which is the cumulative percent variability.

To wrap up this section, here is the percentage of the explained variability of the first 10 PCs.

PC1PC2PC3PC4PC5PC6PC7PC8PC9PC10
Table 1: Variability Explained by the First Ten Principal Components for the AVIRIS data.
82.05717.1760.3200.1820.0940.0650.0370.0290.0140.005

Above variability were obtained by noting that the variability explained by the principal component is simply the eigenvalue (square of the sdev) of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$, hence the percentage of variability explained by the $j$th PC is equal to its corresponding eigenvalue $\lambda_j$ divided by the overall variability which is the sum of the eigenvalues, $\sum_{j=1}^{p}\lambda_j$, as we see in the following code,

Stopping Rules

Given the list of percentage of variability explained by the PCs in Table 1, how many principal components should we take into account that would best represent the variability of the original data? To answer that, we introduce the following stopping rules that will guide us on deciding the number of PCs:
  1. Scree plot;
  2. Simple fare-share;
  3. Broken-stick; and,
  4. Relative broken-stick.
The scree plot is the plot of the variability of the PCs, that is the plot of the eigenvalues. Where we look for an elbow or sudden drop of the eigenvalues on the plot, hence for our example we have
Therefore, we need return the first two principal components based on the elbow shape. However, if the eigenvalues differ by order of magnitude, it is recommended to use the logarithmic scale which is illustrated below,
Unfortunately, sometimes it won't work as we can see here, it's just difficult to determine where the elbow is. The succeeding discussions on the last three stopping rules are based on (Bajorski, 2012). The simple fair-share stopping rule identifies the largest $k$ such that $\lambda_k$ is larger than its fair share, that is larger than $(\lambda_1+\lambda_2+\cdots+\lambda_p)/p$. To illustrate this, consider the following:

Thus, we need to stop at second principal component.

If one was concerned that the above method produces too many principal components, a broken-stick rule could be used. The rule is that it identifies the principal components with largest $k$ such that $\lambda_j/(\lambda_1+\lambda_2+\cdots +\lambda_p)>a_j$, for all $j\leq k$, where \begin{equation}\nonumber a_j = \frac{1}{p}\sum_{i=j}^{p}\frac{1}{i},\quad j =1,\cdots, p. \end{equation} Let's try it,

Above result coincides with the first two stopping rule. The draw back of simple fair-share and broken-stick rules is that it do not work well when the eigenvalues differ by orders of magnitude. In such case, we then use the relative broken-stick rule, where we analyze $\lambda_j$ as the first eigenvalue in the set $\lambda_j\geq \lambda_{j+1}\geq\cdots\geq\lambda_{p}$, where $j < p$. The dimensionality $k$ is chosen as the largest value such that $\lambda_j/(\lambda_j+\cdots +\lambda_p)>b_j$, for all $j\leq k$, where \begin{equation}\nonumber b_j = \frac{1}{p-j+1}\sum_{i=1}^{p-j+1}\frac{1}{i}. \end{equation} Applying this to the data we have,
According to the numerical output, the first 34 principal components are enough to represent the variability of the original data.

Reference

by Al-Ahmadgaid Asaad (noreply@blogger.com) at January 23, 2015 09:45 AM

Removed CRANberries

Package nlrwr (with last version 1.1-0) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2011-11-02 1.1-0
2009-03-25 1.0-6
2009-01-07 1.0-5
2008-03-21 1.0

January 23, 2015 07:14 AM

January 22, 2015

CRANberries

New package shinybootstrap2 with initial version 0.2

Package: shinybootstrap2
Title: Bootstrap 2 web components for use with Shiny
Version: 0.2
Authors@R: c( person("Winston", "Chang", role = c("aut", "cre"), email = "winston@rstudio.com"), person(family = "RStudio", role = "cph"), person("Mark", "Otto", role = "ctb", comment = "Bootstrap library"), person("Jacob", "Thornton", role = "ctb", comment = "Bootstrap library"), person(family = "Bootstrap contributors", role = "ctb", comment = "Bootstrap library; authors listed at https://github.com/twbs/bootstrap/graphs/contributors"), person(family = "Twitter, Inc", role = "cph", comment = "Bootstrap library"), person("Brian", "Reavis", role = c("ctb", "cph"), comment = "selectize.js library"), person("Egor", "Khmelev", role = c("ctb", "cph"), comment = "jslider library"), person(family = "SpryMedia Limited", role = c("ctb", "cph"), comment = "DataTables library") )
Description: Provides Bootstrap 2 web components for use with the Shiny package. With versions of Shiny prior to 0.11, these Bootstrap 2 components were included as part of the package. Later versions of Shiny include Bootstrap 3, so the Bootstrap 2 components have been moved into this package for those uses who rely on features specific to Bootstrap 2.
Depends: R (>= 3.0.0)
License: GPL-3 | file LICENSE
LazyData: true
Imports: htmltools (>= 0.2.6), jsonlite (>= 0.9.12), shiny
Packaged: 2015-01-22 17:07:03 UTC; winston
Author: Winston Chang [aut, cre], RStudio [cph], Mark Otto [ctb] (Bootstrap library), Jacob Thornton [ctb] (Bootstrap library), Bootstrap contributors [ctb] (Bootstrap library; authors listed at https://github.com/twbs/bootstrap/graphs/contributors), Twitter, Inc [cph] (Bootstrap library), Brian Reavis [ctb, cph] (selectize.js library), Egor Khmelev [ctb, cph] (jslider library), SpryMedia Limited [ctb, cph] (DataTables library)
Maintainer: Winston Chang
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-22 19:16:49

More information about shinybootstrap2 at CRAN

January 22, 2015 07:13 PM

New package lba with initial version 1.0

Package: lba
Title: Latent Budget Analysis for Compositional Data
Version: 1.0
Date: 2015-01-22
Author: Enio G. Jelihovschi Ivan Bezerra Allaman
Maintainer: Enio G. Jelihovschi
Depends: R (>= 3.1.2), MASS, alabama, plotrix, ca
Description: Latent budget analysis is a method for the analysis of a two-way contingency table with an exploratory variable and a response variable. It is specially designed for compositional data.
Encoding: latin1
License: GPL (>= 2)
Packaged: 2015-01-22 12:13:45 UTC; ivan
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-22 18:42:44

More information about lba at CRAN

January 22, 2015 05:13 PM

New package ESEA with initial version 1.0

Package: ESEA
Version: 1.0
Title: ESEA: Discovering the Dysregulated Pathways based on Edge Set Enrichment Analysis
Author: Junwei Han, Xinrui Shi, Chunquan Li
Maintainer: Xinrui Shi
Description: The package can identify the dysregulated canonical pathways by investigating the changes of biological relationships of pathways in the context of gene expression data. (1) The ESEA package constructs a background set of edges by extracting pathway structure (e.g. interaction, regulation, modification, and binding etc.) from the seven public databases (KEGG; Reactome; Biocarta; NCI; SPIKE; HumanCyc; Panther) and the edge sets of pathways for each of the above databases. (2) The ESEA package can can quantify the change of correlation between genes for each edge based on gene expression data with cases and controls. (3) The ESEA package uses the weighted Kolmogorov-Smirnov statistic to calculate an edge enrichment score (EES), which reflects the degree to which a given pathway is associated the specific phenotype. (4) The ESEA package can provide the visualization of the results.
Depends: R (>= 2.10),igraph,XML,parmigene
Suggests: Matrix,graph
Collate: calEdgeCorScore.R ESEA.Main.R PlotGlobEdgeCorProfile.R PlotPathwayGraph.R PlotRunEnrichment.R SavePathway2File.R getEnvironmentData.R GetExampleData.R GetEdgesBackgrandData.R GetPathwayEdgeData.R
LazyData: Yes
License: GPL (>= 2)
biocViews: Statistics, Pathways, edge, enrichment analysis
Packaged: 2015-01-22 12:50:32 UTC; Administrator
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-22 15:58:44

More information about ESEA at CRAN

January 22, 2015 03:13 PM

New package RcmdrPlugin.NMBU with initial version 1.8.0

Package: RcmdrPlugin.NMBU
Type: Package
Title: R Commander Plug-In for University Level Applied Statistics
Version: 1.8.0
Date: 2015-01-22
Author: Kristian Hovde Liland, Solve Sćbř
Maintainer: Kristian Hovde Liland
Encoding: latin1
Depends: R (>= 3.0.0), mixlm (>= 1.0.8), MASS, pls, xtable
Imports: Rcmdr (>= 2.0-0), tcltk
Suggests: lme4, leaps, mvtnorm, gmodels, abind, lattice, pbkrtest, vcd, multcomp, e1071, nnet
Description: An R Commander "plug-in" extending functionality of linear models and providing an interface to Partial Least Squares Regression and Linear and Quadratic Discriminant analysis. Several statistical summaries are extended, predictions are offered for additional types of analyses, and extra plots, tests and mixed models are available.
License: GPL (>= 2)
LazyLoad: yes
LazyData: yes
RcmdrModels: mvr, lda, qda, prcomp, mer, rsm, glmerMod, lmerMod
NeedsCompilation: no
Packaged: 2015-01-22 08:26:37 UTC; kristian.liland
Repository: CRAN
Date/Publication: 2015-01-22 13:35:54

More information about RcmdrPlugin.NMBU at CRAN

January 22, 2015 01:13 PM

January 21, 2015

CRANberries

New package TRD with initial version 1.0

Package: TRD
Type: Package
Title: Transmission Ratio Distortion
Version: 1.0
Date: 2015-01-21
Author: Lam Opal Huang
Maintainer: Lam Opal Huang
Depends: Rlab (>= 2.14.0)
Description: Transmission Ratio Distortion (TRD) is a genetic phenomenon where two alleles from either parent are not transmitted to the offspring at the Mendelian 1:1 ratio. Occurrence of TRD in general population can lead to false inflation or attenuation of association signals in case populations. Therefore, it is necessary to adjust for TRD in model fitting of case populations. This package uses models such as loglinear model (Weinberg 1998), augmented loglinear model (Huang 2014), and tests such as TDT (Spielman 1993), augmented TDT (Labbe 2013), on simulated or real datasets. This package has a simulation function which generates a population that is under the influence of TRD, and samples a sub-population, which can serve as the dataset to fit a loglinear model or to perform a TDT. Real dataset with the same data structure as described in the help file of the function 'll' can also be used in the loglinear model and the TDT functions. No real dataset is included in this package.
License: GPL (>= 2)
Packaged: 2015-01-21 18:53:44 UTC; Opal - AL
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-22 00:48:35

More information about TRD at CRAN

January 21, 2015 11:13 PM

New package rebus with initial version 0.0-4

Package: rebus
Type: Package
Title: Build Regular Expressions in a Human Readable Way
Version: 0.0-4
Date: 2015-01-20
Author: Richard Cotton [aut, cre]
Maintainer: Richard Cotton
Authors@R: person("Richard", "Cotton", role = c("aut", "cre"), email = "richierocks@gmail.com")
Description: Build regular expressions piece by piece using human readable code.
Depends: R (>= 3.1.0)
Suggests: testthat
License: Unlimited
LazyLoad: yes
LazyData: yes
Acknowledgments: Development of this package was partially funded by the Proteomics Core at Weill Cornell Medical College in Qatar . The Core is supported by 'Biomedical Research Program' funds, a program funded by Qatar Foundation.
Collate: 'alternation.R' 'regex-methods.R' 'backreferences.R' 'capture.R' 'internal.R' 'grouping-and-repetition.R' 'constants.R' 'class-groups.R' 'concatenation.R' 'compound-constants.R' 'escape_special.R' 'datetime.R' 'lookaround.R' 'misc.R' 'number_range.R' 'regex-package.R' 'unicode-groups.R' 'unicode.R' 'zzz.R'
Packaged: 2015-01-20 13:49:39 UTC; rjc2003
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-22 00:46:57

More information about rebus at CRAN

January 21, 2015 11:13 PM

New package compLasso with initial version 0.0-1

Package: compLasso
Type: Package
Title: Implements the Component Lasso Method Functions
Version: 0.0-1
Date: 2014-10-04
Imports: quadprog, Matrix
Author: Nadine Hussami, Robert Tibshirani and Jerome Friedman
Maintainer: Nadine Hussami
Description: Implements the Component lasso method for linear regression using the sample covariance matrix connected-components structure, described in A Component Lasso, by Hussami and Tibshirani (2013)
License: GPL (>= 2)
Packaged: 2015-01-21 17:47:49 UTC; nadine
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2015-01-21 21:28:36

More information about compLasso at CRAN

January 21, 2015 09:12 PM

New package npIntFactRep with initial version 1.0

Package: npIntFactRep
Type: Package
Title: Nonparametric Rank Test for Interaction in Factorial Designs with Repeated Measures
Version: 1.0
Date: 2015-01-15
Author: Jos Feys
Maintainer: Jos Feys
Description: Noparametric interaction tests on data set in 'wide' format with repeated measures. Choice between 1) Aligend Regular ranks, 2) Friedman ranks , and Koch ranks
License: GPL (>= 2)
Packaged: 2015-01-21 13:10:38 UTC; Jos
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-21 17:14:54

More information about npIntFactRep at CRAN

January 21, 2015 05:13 PM

New package mixlm with initial version 1.0.8

Package: mixlm
Type: Package
Title: Mixed Model ANOVA and Statistics for Education
Version: 1.0.8
Date: 2015-01-21
Authors@R: c(person("Kristian Hovde", "Liland", role = c("aut","cre"), email="kristian.liland@nmbu.no"), person("Solve", "Sćbř", role=c("ctb")), person(family="R-Core", role="ctb"))
Maintainer: Kristian Hovde Liland
Encoding: latin1
Description: The main functions perform mixed models analysis by least squares or REML by adding the function r() to formulas of lm and glm. A collection of text-book statistics for higher education is also included, e.g. modifications of the functions lm, glm and associated summaries from the package stats.
Depends: multcomp, pls, pracma
Imports: leaps, lme4, car
License: GPL (>= 2)
LazyLoad: yes
LazyData: yes
NeedsCompilation: no
Packaged: 2015-01-21 07:50:17 UTC; kristian.liland
Author: Kristian Hovde Liland [aut, cre], Solve Sćbř [ctb], R-Core [ctb]
X-CRAN-Comment: Earlier versions were removed on 2015-01-20 for copyright violation.
Repository: CRAN
Date/Publication: 2015-01-21 12:57:12

More information about mixlm at CRAN

January 21, 2015 11:13 AM

Journal of Statistical Software

ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data

Vol. 62, Issue 7, Jan 2015

Abstract:

There are many different ways in which change point analysis can be performed, from purely parametric methods to those that are distribution free. The ecp package is designed to perform multiple change point analysis while making as few assumptions as possible. While many other change point methods are applicable only for univariate data, this R package is suitable for both univariate and multivariate observations. Hierarchical estimation can be based upon either a divisive or agglomerative algorithm. Divisive estimation sequentially identifies change points via a bisection algorithm. The agglomerative algorithm estimates change point locations by determining an optimal segmentation. Both approaches are able to detect any type of distributional change within the data. This provides an advantage over many existing change point algorithms which are only able to detect changes within the marginal distributions.

January 21, 2015 08:00 AM

Multilevel Modeling Using R

Vol. 62, Book Review 1, Jan 2015

Multilevel Modeling Using R
William Holmes Finch, Jocelyn E. Bolin, Ken Kelley
Chapman and Hall/CRC, 2014
ISBN: 978-1-4665-1585-7

January 21, 2015 08:00 AM

vSMC: Parallel Sequential Monte Carlo in C++

Vol. 62, Issue 9, Jan 2015

Abstract:

Sequential Monte Carlo is a family of algorithms for sampling from a sequence of distributions. Some of these algorithms, such as particle filters, are widely used in physics and signal processing research. More recent developments have established their application in more general inference problems such as Bayesian modeling.
These algorithms have attracted considerable attention in recent years not only be- cause that they have desired statistical properties, but also because they admit natural and scalable parallelization. However, they are perceived to be difficult to implement. In addition, parallel programming is often unfamiliar to many researchers though conceptually appealing.
A C++ template library is presented for the purpose of implementing generic sequential Monte Carlo algorithms on parallel hardware. Two examples are presented: a simple particle filter and a classic Bayesian modeling problem.

January 21, 2015 08:00 AM

envlp: A MATLAB Toolbox for Computing Envelope Estimators in Multivariate Analysis

Vol. 62, Issue 8, Jan 2015

Abstract:

Envelope models and methods represent new constructions that can lead to substantial increases in estimation efficiency in multivariate analyses. The envlp toolbox implements a variety of envelope estimators under the framework of multivariate linear regression, including the envelope model, partial envelope model, heteroscedastic envelope model, inner envelope model, scaled envelope model, and envelope model in the predictor space. The toolbox also implements the envelope model for estimating a multivariate mean. The capabilities of this toolbox include estimation of the model parameters, as well as performing standard multivariate inference in the context of envelope models; for example, prediction and prediction errors, F test for two nested models, the standard errors for contrasts or linear combinations of coefficients, and more. Examples and datasets are contained in the toolbox to illustrate the use of each model. All functions and datasets are documented.

January 21, 2015 08:00 AM

CRANberries

New package gammSlice with initial version 1.3

Package: gammSlice
Type: Package
Title: Generalized additive mixed model analysis via slice sampling
Version: 1.3
Date: 2015-01-21
Author: Tung Pham and Matt Wand
Maintainer: Tung Pham
Description: Uses a slice sampling-based Markov chain Monte Carlo to conduct Bayesian fitting and inference for generalized additive mixed models (GAMM). Generalized linear mixed models and generalized additive models are also handled as special cases of GAMM.
Depends: R(>= 2.13), KernSmooth, lattice, mgcv
License: GPL (>= 2)
Packaged: 2015-01-21 06:46:35 UTC; tungp
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2015-01-21 08:24:52

More information about gammSlice at CRAN

January 21, 2015 07:13 AM

New package conformal with initial version 0.1

Package: conformal
Type: Package
Title: Conformal Prediction for Regression and Classification
Version: 0.1
Date: 04/10/2014
Author: Isidro Cortes
Maintainer: Isidro Cortes
Depends: R (>= 2.12.0), caret, ggplot2, grid, randomForest, e1071, methods
Suggests: kernlab
Description: Implementation of conformal prediction using caret models for classification and regression
License: GPL
LazyLoad: yes
Packaged: 2015-01-20 23:49:25 UTC; icortes
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-21 07:19:33

More information about conformal at CRAN

January 21, 2015 07:13 AM

New package ArfimaMLM with initial version 1.3

Package: ArfimaMLM
Type: Package
Title: Arfima-MLM Estimation For Repeated Cross-Sectional Data
Version: 1.3
Date: 2015-01-20
Authors@R: c(person("Patrick", "Kraft", role = c("aut", "cre"), email = "patrick.kraft@stonybrook.edu"), person("Christopher", "Weber", role = "ctb"))
Description: Functions to facilitate the estimation of Arfima-MLM models for repeated cross-sectional data and pooled cross-sectional time-series data (see Lebo and Weber 2015). The estimation procedure uses double filtering with Arfima methods to account for autocorrelation in repeated cross-sectional data followed by multilevel modeling (MLM) to estimate aggregate as well as individual-level parameters simultaneously.
Depends: R (>= 3.0.0), lme4, fractal
Imports: fracdiff
License: GPL (>= 2)
URL: https://github.com/pwkraft/ArfimaMLM
Packaged: 2015-01-20 23:53:31 UTC; patrick
Author: Patrick Kraft [aut, cre], Christopher Weber [ctb]
Maintainer: Patrick Kraft
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-21 07:09:03

More information about ArfimaMLM at CRAN

January 21, 2015 07:12 AM

January 20, 2015

CRANberries

New package shinythemes with initial version 1.0

Package: shinythemes
Title: Themes for Shiny
Version: 1.0
Authors@R: c( person("Winston", "Chang", role = c("aut", "cre"), email = "winston@rstudio.com"), person(family = "RStudio", role = "cph"), person("Thomas", "Park", role = c("ctb", "cph"), comment = "Bootswatch themes"), person("Lukasz", "Dziedzic", role = c("ctb", "cph"), comment = "Lato font"), person("Nathan", "Willis", role = c("ctb", "cph"), comment = "News Cycle font"), person(family = "Google Corporation", role = c("ctb", "cph"), comment = "Open Sans font"), person("Matt", "McInerney", role = c("ctb", "cph"), comment = "Raleway font"), person(family = "Adobe Systems Incorporated", role = c("ctb", "cph"), comment = "Source Sans Pro font"), person(family = "Canonical Ltd", role = c("ctb", "cph"), comment = "Ubuntu font") )
Description: Themes for use with Shiny. Includes several Bootstrap themes from http://bootswatch.com/, which are packaged for use with Shiny applications.
Depends: R (>= 3.0.0)
Imports: shiny (>= 0.11)
License: GPL-3 | file LICENSE
Packaged: 2015-01-19 04:35:35 UTC; winston
Author: Winston Chang [aut, cre], RStudio [cph], Thomas Park [ctb, cph] (Bootswatch themes), Lukasz Dziedzic [ctb, cph] (Lato font), Nathan Willis [ctb, cph] (News Cycle font), Google Corporation [ctb, cph] (Open Sans font), Matt McInerney [ctb, cph] (Raleway font), Adobe Systems Incorporated [ctb, cph] (Source Sans Pro font), Canonical Ltd [ctb, cph] (Ubuntu font)
Maintainer: Winston Chang
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-20 14:37:41

More information about shinythemes at CRAN

January 20, 2015 01:13 PM

Removed CRANberries

Package mixlm (with last version 1.0.7) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-11-09 1.0.7
2014-09-16 1.0.6
2014-06-29 1.0.5
2014-05-15 1.0.3
2014-03-20 1.0.2
2014-02-20 1.0.0

January 20, 2015 11:13 AM

Package RcmdrPlugin.NMBU (with last version 1.7.6) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-11-10 1.7.6
2014-09-19 1.7.5
2014-09-16 1.7.4
2014-06-30 1.7.3
2014-05-16 1.7.2
2014-03-20 1.7.1
2014-02-21 1.7.0

January 20, 2015 11:13 AM

CRANberries

New package RDS with initial version 0.7

Package: RDS
Type: Package
Title: Respondent-Driven Sampling
Version: 0.7
Date: 2014-11-11
Authors@R: c( person("Mark S.", "Handcock", role=c("aut","cre"), email="handcock@stat.ucla.edu"), person("Krista J.", "Gile", role=c("aut"), email="gile@math.umass.edu"), person("Ian E.", "Fellows", role=c("aut"), email="ian@fellstat.com"), person("W. Whipple", "Neely", role=c("aut"), email="wwneely@stat.washington.edu"))
Maintainer: Mark S. Handcock
Description: This package provides functionality for carrying out estimation with data collected using Respondent-Driven Sampling. This includes Heckathorn's RDS-I and RDS-II estimators as well as Gile's Sequential Sampling estimator.
License: LGPL-2.1
URL: http://www.hpmrg.org
Depends: methods
Suggests: isotone, network, survey, testthat
Imports: gridExtra, ggplot2, Hmisc, igraph, locfit, reshape2, scales
NeedsCompilation: yes
Packaged: 2015-01-19 23:59:13 UTC; handcock
Author: Mark S. Handcock [aut, cre], Krista J. Gile [aut], Ian E. Fellows [aut], W. Whipple Neely [aut]
Repository: CRAN
Date/Publication: 2015-01-20 06:14:09

More information about RDS at CRAN

January 20, 2015 05:13 AM

January 19, 2015

CRANberries

New package nLTT with initial version 1.0

Package: nLTT
Type: Package
Title: Calculate The NLTT Statistic
Version: 1.0
Date: 2014-09-17
Author: Thijs Janzen
Maintainer: Thijs Janzen
Description: Provides functions to calculate the normalised Lineage-Through-Time (nLTT) statistic, given two phylogenetic trees. The nLTT statistic measures the absolute difference between two Lineage-Through-Time curves, where each curve is normalised both in time and in number of lineages.
License: GPL-2
Imports: ape,coda,deSolve
Suggests: TESS
Packaged: 2015-01-19 14:04:52 UTC; janzen
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-19 19:02:14

More information about nLTT at CRAN

January 19, 2015 05:13 PM

New package gsw with initial version 1.0-3

Package: gsw
Version: 1.0-3
Date: 2015-01-11
Title: Gibbs Sea Water Functions
Authors@R: c(person("Dan", "Kelley", role=c("aut","cre","cph"), email="dan.kelley@dal.ca", comment="C wrapper plus R code, tests, and documentation"), person("Clark", "Richards", role=c("aut","cph"), email="clark.richards@gmail.com", comment="C wrapper plus R code, tests, and documentation"), person("WG127", "SCOR/IAPSO", role=c("aut","cph"), comment="Original Matlab and derived code"))
Copyright: Original algorithms and Matlab/C library (c) 2014 WG127 SCOR/IAPSO (Scientific Committee on Oceanic Research / International Association for the Physical Sciences of the Oceans, Working Group 127); C wrapper code and R code (c) 2015 Dan Kelley and Clark Richards
Maintainer: Dan Kelley
Depends: R (>= 2.15)
BugReports: https://github.com/TEOS-10/GSW-R/issues
Description: Provides an interface to the Gibbs SeaWater (TEOS-10) C library, which derives from Matlab and other code written by WG127 (Working Group 127) of SCOR/IAPSO (Scientific Committee on Oceanic Research / International Association for the Physical Sciences of the Oceans).
URL: http://teos-10.github.io/GSW-R/index.html
License: GPL (>= 2) | file LICENSE
LazyData: no
Packaged: 2015-01-19 15:17:55.139 UTC; kelley
Imports: utils
Author: Dan Kelley [aut, cre, cph] (C wrapper plus R code, tests, and documentation), Clark Richards [aut, cph] (C wrapper plus R code, tests, and documentation), WG127 SCOR/IAPSO [aut, cph] (Original Matlab and derived code)
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2015-01-19 18:43:55

More information about gsw at CRAN

January 19, 2015 05:13 PM

Alstatr

R: Canonical Correlation Analysis on Imaging

In imaging, we deal with multivariate data, like in array form with several spectral bands. And trying to come up with interpretation across correlations of its dimensions is very challenging, if not impossible. For example let's recall the number of spectral bands of AVIRIS data we used in the previous post. There are 152 bands, so in total there are 152$\cdot$152 = 23104 correlations of pairs of random variables. How will you be able to interpret that huge number of correlations?

To engage on this, it might be better if we group these variables into two and study the relationship between these sets of variables. Such statistical procedure can be done using the canonical correlation analysis (CCA). An example of this on health sciences (from Reference 2) is variables related to exercise and health. On one hand you have variables associated with exercise, observations such as the climbing rate on a stair stepper, how fast you can run, the amount of weight lifted on bench press, the number of push-ups per minute, etc. But you also might have health variables such as blood pressure, cholesterol levels, glucose levels, body mass index, etc. So two types of variables are measured and the relationships between the exercise variables and the health variables are to be studied.

Methodology

Mathematically we have the following procedures:
  1. Divide the random variables into two groups, and assign these to the following random vectors: \begin{equation}\nonumber \mathbf{X} = [X_1,X_2,\cdots, X_p]^T\;\text{and}\;\mathbf{Y} = [Y_1,Y_2,\cdots, Y_q]^T \end{equation}
  2. Analogous to principal component analysis (PCA), we aim to find a linear combination \begin{equation}\nonumber \begin{aligned} U_1 = &\mathbf{a}_1^T\mathbf{X} = a_{11}X_1 + a_{12}X_2+\cdots + a_{1p}X_p\\ U_2 = &\mathbf{a}_2^T\mathbf{X} = a_{21}X_1 + a_{22}X_2+\cdots + a_{2p}X_p\\ &\qquad\quad\qquad\vdots\qquad\qquad\vdots\\ U_p = &\mathbf{a}_p^T\mathbf{X} = a_{p1}X_1 + a_{p2}X_2+\cdots + a_{pp}X_p \end{aligned} \end{equation} and \begin{equation}\nonumber \begin{aligned} V_1 = &\mathbf{b}_1^T\mathbf{Y}=b_{11}Y_1 + b_{12}Y_2+\cdots + b_{1q}Y_q\\ V_2 = &\mathbf{b}_2^T\mathbf{Y}=b_{21}Y_1 + b_{22}Y_2+\cdots + b_{2q}Y_q\\ &\qquad\quad\qquad\vdots\qquad\qquad\vdots\\ V_q = &\mathbf{b}_q^T\mathbf{Y}=b_{q1}Y_1 + b_{q2}Y_2+\cdots + b_{qq}Y_q\\ \end{aligned} \end{equation} that will maximize the correlation \begin{equation}\nonumber Corr(U_i,V_i)=\frac{Cov(U_i,V_i)}{\sqrt{Var(U_i)}\sqrt{Var{V_i}}},\quad i=1,2\cdots,n \end{equation} where $n = \min{(p, q)}$.
  3. The first pair canonical variables is defined by \begin{equation}\nonumber Corr(U_1, V_1)=\rho_1=\sqrt{\rho_1^2}, \end{equation} where $\rho_1$, the first canonical correlation, is the square root of the highest of the eigenvalues, $\rho_1^2\geq \rho_2^2\geq \cdots \geq \rho_n^2$, which is the eigenvalues of the matrix $\mathbf{\Sigma}_{XX}^{-1/2}\mathbf{\Sigma}_{XY}\mathbf{\Sigma}_{YY}^{-1}\mathbf{\Sigma}_{XY}^{T}\mathbf{\Sigma}_{XX}^{-1/2}$, where $\mathbf{\Sigma}_{XX}$ is the variance-covariance of $\mathbf{X}$; $\mathbf{\Sigma}_{YY}$ is the variance-covariance of $\mathbf{Y}$; and $\mathbf{\Sigma}_{XY}$ is the covariance matrix of the random vector $\mathbf{XY}$. So that the second pair canonical variable is given by \begin{equation}\nonumber Corr(U_2, V_2)=\rho_2=\sqrt{\rho_2^2}, \end{equation} and so on.
For more detailed theory of CCA, please refer to Reference 1 and 2 below. To continue, let's apply this methodology on an image. We will use the Grass data from (Bajorski, 2012), and do analysis on it using R. Below is the proper description of the data.

Data

Grass data is a spectral image of 64 by 64 pixels, grass texture. Each pixel is represented by a spectral reflectance curve in 42 spectral bands with reflectance given in percent.

Analysis

To begin, let's display the data in an image form:


The code generates the first 12 spectral bands of the data, where we observe a significant change on brightness of the twelfth band compared to the first band. The signature of all pixels across these bands is shown below:
Investigating on the above plot tells us that it seems almost all bands are correlated; that is, if the reflectance of a given pixel on $i$th band (increases or decreases), the $j$th band, $i\neq j$, is also expected to (increase or decrease); except on bands 30 and 31 where seems to be no clear pattern on it. But that's subjective, we cannot tell exactly because there are 4096 signatures (lines in the plot) that will likely to overlap other important informations. So to see properly the relationship between all variables, here is the correlation matrix of all the spectral bands,
The cyan colour engulfing almost 60 percent of the region indicates higher correlation between the corresponding spectral bands. But the fuchsia colour that is pronounced in the plot tells us low correlation between those bands. Now let's divide this data into two, from 42 bands we can have two equal sets of variables (each with 21 dimensions). But for purpose of illustration, we'll consider unequal sets of variables, say the first 15 bands is classified as first group and the remaining bands 16 - 42 be the second group, hence $p=15$ and $q=27$. So that there are $\min(p,q)=n=15$ pairs of canonical variables. And applying CCA we have,

The above numerical output returned is actually the $n=15$ canonical correlations. And as we can see, the first five canonical correlations are very large implying that the linear combinations we obtain on the first five canonical variables were highly correlated to each other. For subsequent correlations, similar way of interpretation can be done. Next, we'll examine the coefficients of the first five canonical variables to see which bands is highly explained by the above canonical correlations. The cancor function returns the following components:
  1. cor - correlations;
  2. xcoef - estimated coefficients for the x variables;
  3. ycoef - estimated coefficients for the y variables;
  4. xcenter - the values used to adjust the x variables; and,
  5. ycenter - the values used to adjust the x variables.
We are interested on xcoef and ycoef, and so the plot of the coefficients of the first three $i$s of $U_i$s and $V_i$s random variables is shown below,
A closer look on the plot of the coefficients of the first three $U_i$s random variables, shows us fluctuations of loadings between negative and positive values, so that the $U_1,U_2,$ and $U_3$ are a contrast of the spectral bands. And a similar situation is also observed on the plot of the coefficients of the first three $V_i$s random variables, and because of that we cannot further tell for a more specific interpretation on these bands.

Test of Canonical Dimension

The dimension of the canonical variates above is $n = 15$, let's check if all these are statistically significant. We'll use the CCP (Significance Tests for Canonical Correlation Analysis) R package, which contains p.asym function that will do the job for us.

Above output tells us that with 0.05 level of significance, only the first 13 canonical dimensions are significant out of 15.

For more on CCA using R, please check Reference 3. If you want to perform it on SAS, you might want to check Reference 2, and for more on imaging I suggest Reference 1.

Reference

  1. Bajorski, P. (2012). Statistics for Imaging, Optics, and Photonics. John Wiley & Sons, Inc.
  2. Stat 505 - Applied Multivariate Statistical Analysis. Lesson 8: Canonical Correlation Analysis. Eberly College of Science, Pennsylvania State University (Penn State). (accessed January 2, 2015)
  3. R Data Analysis Examples: Canonical Correlation Analysis. UCLA: Statistical Consulting Group. From http://www.ats.ucla.edu/stat/r/dae/canonical.htm (accessed January 4, 2015)

by Al-Ahmadgaid Asaad (noreply@blogger.com) at January 19, 2015 03:17 PM

CRANberries

New package pathological with initial version 0.0-3

Package: pathological
Type: Package
Title: Path Manipulation Utilities
Version: 0.0-3
Date: 2014-12-23
Author: Richard Cotton [aut, cre], Janko Thyson [ctb]
Maintainer: Richard Cotton
Authors@R: c(person("Richard", "Cotton", role = c("aut", "cre"), email = "richierocks@gmail.com"), person("Janko", "Thyson", role = "ctb"))
Description: Utilities for paths, files and directories.
Depends: R (>= 2.15.0)
Imports: assertive, plyr, stringr
Suggests: testthat
License: Unlimited
LazyLoad: yes
Acknowledgments: Development of this package was partially funded by the Proteomics Core at Weill Cornell Medical College in Qatar . The Core is supported by 'Biomedical Research Program' funds, a program funded by Qatar Foundation.
Packaged: 2014-12-23 12:43:11 UTC; rjc2003
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-19 10:25:52

More information about pathological at CRAN

January 19, 2015 09:13 AM

January 18, 2015

Dirk Eddelbuettel

Running UBSAN tests via clang with Rocker

Every now and then we get reports from CRAN about our packages failing a test there. A challenging one concerns UBSAN, or Undefined Behaviour Sanitizer. For background on UBSAN, see this RedHat blog post for gcc and this one from LLVM about clang.

I had written briefly about this before in a blog post introducing the sanitizers package for tests, as well as the corresponding package page for sanitizers, which clearly predates our follow-up Rocker.org repo / project described in this initial announcement and when we became the official R container for Docker.

Rocker had support for SAN testing, but UBSAN was not working yet. So following a recent CRAN report against our RcppAnnoy package, I was unable to replicate the error and asked for help on r-devel in this thread.

Martyn Plummer and Jan van der Laan kindly sent their configurations in the same thread and off-list; Jeff Horner did so too following an initial tweet offering help. None of these worked for me, but further trials eventually lead me to the (already mentioned above) RedHat blog post with its mention of -fno-sanitize-recover to actually have an error abort a test. Which, coupled with the settings used by Martyn, were what worked for me: clang-3.5 -fsanitize=undefined -fno-sanitize=float-divide-by-zero,vptr,function -fno-sanitize-recover.

This is now part of the updated Dockerfile of the R-devel-SAN-Clang repo behind the r-devel-ubsan-clang. It contains these settings, as well a new support script check.r for littler---which enables testing right out the box.

Here is a complete example:

docker                              # run Docker (any recent version, I use 1.2.0)
  run                               # launch a container 
    --rm                            # remove Docker temporary objects when dome
    -ti                             # use a terminal and interactive mode 
    -v $(pwd):/mnt                  # mount the current directory as /mnt in the container
    rocker/r-devel-ubsan-clang      # using the rocker/r-devel-ubsan-clang container
  check.r                           # launch the check.r command from littler (in the container)
    --setwd /mnt                    # with a setwd() to the /mnt directory
    --install-deps                  # installing all package dependencies before the test
    RcppAnnoy_0.0.5.tar.gz          # and test this tarball

I know. It is a mouthful. But it really is merely the standard practice of running Docker to launch a single command. And while I frequently make this the /bin/bash command (hence the -ti options I always use) to work and explore interactively, here we do one better thanks to the (pretty useful so far) check.r script I wrote over the last two days.

check.r does about the same as R CMD check. If you look inside check you will see a call to a (non-exported) function from the (R base-internal) tools package. We call the same function here. But to make things more interesting we also first install the package we test to really ensure we have all build-dependencies from CRAN met. (And we plan to extend check.r to support additional apt-get calls in case other libraries etc are needed.) We use the dependencies=TRUE option to have R smartly install Suggests: as well, but only one level (see help(install.packages) for details. With that prerequisite out of the way, the test can proceed as if we had done R CMD check (and additional R CMD INSTALL as well). The result for this (known-bad) package:

edd@max:~/git$ docker run --rm -ti -v $(pwd):/mnt rocker/r-devel-ubsan-clang check.r --setwd /mnt --install-deps RcppAnnoy_0.0.5.tar.gz 
also installing the dependencies ‘Rcpp’, ‘BH’, ‘RUnit’

trying URL 'http://cran.rstudio.com/src/contrib/Rcpp_0.11.3.tar.gz'
Content type 'application/x-gzip' length 2169583 bytes (2.1 MB)
opened URL
==================================================
downloaded 2.1 MB

trying URL 'http://cran.rstudio.com/src/contrib/BH_1.55.0-3.tar.gz'
Content type 'application/x-gzip' length 7860141 bytes (7.5 MB)
opened URL
==================================================
downloaded 7.5 MB

trying URL 'http://cran.rstudio.com/src/contrib/RUnit_0.4.28.tar.gz'
Content type 'application/x-gzip' length 322486 bytes (314 KB)
opened URL
==================================================
downloaded 314 KB

trying URL 'http://cran.rstudio.com/src/contrib/RcppAnnoy_0.0.4.tar.gz'
Content type 'application/x-gzip' length 25777 bytes (25 KB)
opened URL
==================================================
downloaded 25 KB

* installing *source* package ‘Rcpp’ ...
** package ‘Rcpp’ successfully unpacked and MD5 sums checked
** libs
clang++-3.5 -fsanitize=undefined -fno-sanitize=float-divide-by-zero,vptr,function -fno-sanitize-recover -I/usr/local/lib/R/include -DNDEBUG -I../inst/include/ -I/usr/local/include    -fpic  -pipe -Wall -pedantic -
g  -c Date.cpp -o Date.o

[...]
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘runUnitTests.R’
 ERROR
Running the tests in ‘tests/runUnitTests.R’ failed.
Last 13 lines of output:
  +     if (getErrors(tests)$nFail > 0) {
  +         stop("TEST FAILED!")
  +     }
  +     if (getErrors(tests)$nErr > 0) {
  +         stop("TEST HAD ERRORS!")
  +     }
  +     if (getErrors(tests)$nTestFunc < 1) {
  +         stop("NO TEST FUNCTIONS RUN!")
  +     }
  + }
  
  
  Executing test function test01getNNsByVector  ... ../inst/include/annoylib.h:532:40: runtime error: index 3 out of bounds for type 'int const[2]'
* checking PDF version of manual ... OK
* DONE

Status: 1 ERROR, 2 WARNINGs, 1 NOTE
See/tmp/RcppAnnoy/..Rcheck/00check.logfor details.
root@a7687c014e55:/tmp/RcppAnnoy# 

The log shows that thanks to check.r, we first download and the install the required packages Rcpp, BH, RUnit and RcppAnnoy itself (in the CRAN release). Rcpp is installed first, we then cut out the middle until we get to ... the failure we set out to confirm.

Now having a tool to confirm the error, we can work on improved code.

One such fix currently under inspection in a non-release version 0.0.5.1 then passes with the exact same invocation (but pointing at RcppAnnoy_0.0.5.1.tar.gz):

edd@max:~/git$ docker run --rm -ti -v $(pwd):/mnt rocker/r-devel-ubsan-clang check.r --setwd /mnt --install-deps RcppAnnoy_0.0.5.1.tar.gz
also installing the dependencies ‘Rcpp’, ‘BH’, ‘RUnit’
[...]
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘runUnitTests.R’
 OK
* checking PDF version of manual ... OK
* DONE

Status: 1 WARNING
See/mnt/RcppAnnoy.Rcheck/00check.logfor details.

edd@max:~/git$

This proceeds the same way from the same pristine, clean container for testing. It first installs the four required packages, and the proceeds to test the new and improved tarball. Which passes the test which failed above with no issues. Good.

So we now have an "appliance" container anybody can download from free from the Docker hub, and deploy as we did here in order to have fully automated, one-command setup for testing for UBSAN errors.

UBSAN is a very powerful tool. We are only beginning to deploy it. There are many more useful configuration settings. I would love to hear from anyone who would like to work on building this out via the R-devel-SAN-Clang GitHub repo. Improvements to the littler scripts are similarly welcome (and I plan on releasing an updated littler package "soon").

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

January 18, 2015 08:12 PM

CRANberries

New package hddtools with initial version 0.2.4

Package: hddtools
Type: Package
Version: 0.2.4
Date: 2014-11-21
Title: Hydrological Data Discovery Tools
Description: Facilitates discovery and handling of hydrological data, non-programmatic access to catalogues and databases.
Authors@R: c(person("Claudia", "Vitolo", role = c("aut", "cre"), email = "cvitolodev@gmail.com"), person("Simon", "Moulds", role = "aut", email = "simonmdev@riseup.net"))
Author: Claudia Vitolo [aut, cre], Simon Moulds [aut]
Maintainer: Claudia Vitolo
URL: http://cvitolo.github.io/r_hddtools/
BugReports: https://github.com/cvitolo/r_hddtools/issues
Depends: R (>= 3.0.2)
Imports: sp, rgdal, raster, RCurl, XML, zoo
License: GPL-3
LazyData: true
Repository: CRAN
Keywords: hydrology, hydrological modelling, hydrologic modeling, time series, environmental data, web technologies and services
Packaged: 2015-01-17 18:23:35 UTC; claudia
NeedsCompilation: no
Date/Publication: 2015-01-18 06:44:12

More information about hddtools at CRAN

January 18, 2015 05:13 AM

January 17, 2015

CRANberries

New package onls with initial version 0.1-0

Package: onls
Type: Package
LazyLoad: no
LazyData: yes
Title: Orthogonal Nonlinear Least-Squares Regression
Version: 0.1-0
Date: 2015-01-17
Author: Andrej-Nikolai Spiess
Maintainer: Andrej-Nikolai Spiess
Description: Orthogonal Nonlinear Least-Squares Regression using Levenberg-Marquardt minimization.
License: GPL (>= 2)
Depends: R (>= 2.13.0), minpack.lm
NeedsCompilation: no
Packaged: 2015-01-17 18:53:48 UTC; aspiess
Repository: CRAN
Date/Publication: 2015-01-17 21:10:48

More information about onls at CRAN

January 17, 2015 07:12 PM

New package ALTopt with initial version 0.1.0

Package: ALTopt
Title: Optimal Experimental Designs for Accelerated Life Testing
Version: 0.1.0
Authors@R: c(person("Kangwon", "Seo", role = c("aut", "cre"), email = "kseo7@asu.edu"), person("Rong", "Pan", role = "aut", email = "rong.pan@asu.edu"))
Description: This package creates the optimal (D, U and I) designs for the accelerated life testing with right censoring or interval censoring. It uses generalized linear model (GLM) approach to derive the asymptotic variance-covariance matrix of regression coefficients. The failure time distribution is assumed to follow Weibull distribution with a known shape parameter and log-linear link functions are used to model the relationship between failure time parameters and stress variables. Any number of stress factors can be used but no more than 3 are recommended due to the computational time. ALTopt package also provides several plotting functions including contour plot, Fraction of Use Space (FUS) plot and Variance Dispersion graphs of Use Space (VDUS) plot.
Depends: R (>= 3.0.0)
License: GPL-3
LazyData: true
Imports: cubature (>= 1.0), lattice (>= 0.20)
Packaged: 2015-01-17 00:15:29 UTC; kseo7
Author: Kangwon Seo [aut, cre], Rong Pan [aut]
Maintainer: Kangwon Seo
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-17 07:22:11

More information about ALTopt at CRAN

January 17, 2015 07:12 AM

January 16, 2015

CRANberries

New package caretEnsemble with initial version 1.0.0

Package: caretEnsemble
Type: Package
Title: Ensembles of Caret Models
Version: 1.0.0
Date: 2015-01-14
Authors@R: c(person(c("Zachary", "A."), "Mayer", role = c("aut", "cre"), email = "zach.mayer@gmail.com"), person(c("Jared", "E."), "Knowles", role=c("aut"), email="jknowles@gmail.com"))
URL: https://github.com/zachmayer/caretEnsemble
BugReports: https://github.com/zachmayer/caretEnsemble/issues
Description: Functions for creating ensembles of caret models: caretList, caretEnsemble, and caretStack. caretList is a convenience function for fitting multiple caret::train models to the same dataset. caretEnsemble will make a linear combination of these models using greedy forward selection, and caretStack will make linear or non-linear combinations of these models, using a caret::train model as a meta-model.
Depends: R (>= 3.1.0), caret
Suggests: testthat, randomForest, rpart, kernlab, nnet, e1071, ipred, pROC, knitr, mlbench, MASS, gbm, klaR
Imports: caTools, pbapply, ggplot2, digest, plyr, grid, lattice, gridExtra
License: MIT + file LICENSE
VignetteBuilder: knitr
Packaged: 2015-01-14 20:16:22 UTC; zach
Author: Zachary A. Mayer [aut, cre], Jared E. Knowles [aut]
Maintainer: Zachary A. Mayer
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-16 22:19:29

More information about caretEnsemble at CRAN

January 16, 2015 09:12 PM

New package anfis with initial version 0.99.1

Package: anfis
Type: Package
Title: Adaptive Neuro Fuzzy Inference System in R
Version: 0.99.1
Date: 2015-01-16
Author: Cristobal Fresno, Andrea S. Llera and Elmer A. Fernandez
Maintainer: Cristobal Fresno
Description: The package implements ANFIS Type 3 Takagi and Sugeno's fuzzy if-then rule network with the following features: (1) Independent number of membership functions(MF) for each input, and also different MF extensible types. (2) Type 3 Takagi and Sugeno's fuzzy if-then rule (3) Full Rule combinations, e.g. 2 inputs 2 membership funtions -> 4 fuzzy rules (4) Hibrid learning, i.e. Descent Gradient for precedents and Least Squares Estimation for consequents (5) Multiple outputs.
URL: http://www.bdmg.com.ar
License: GPL (>= 2)
Depends: R (>= 3.0), methods, parallel
Collate: 'MembershipFunction.R' 'MembershipFunction-show.R' 'BellMF.R' 'GaussianMF.R' 'NormalizedGaussianMF.R' 'MembershipFunction-evaluateMF.R' 'Anfis.R' 'Anfis-initialize.R' 'Anfis-getters.R' 'Anfis-printshow.R' 'Anfis-metrics.R' 'Anfis-package.R' 'Anfis-plotMF.R' 'Anfis-plot.R' 'Anfis-predict.R' 'Anfis-training.R' 'Anfis-trainSet.R' 'Anfis3-example.R' 'MembershipFunction-derivateMF.R' 'MembershipFunction-getset.R' 'MembershipFunction-print.R'
NeedsCompilation: no
Packaged: 2015-01-16 14:45:54 UTC; cristobal
Repository: CRAN
Date/Publication: 2015-01-16 16:22:46

More information about anfis at CRAN

January 16, 2015 03:12 PM

Modern Toolmaking

caretEnsemble

My package caretEnsemble, for making ensembles of caret models, is now on CRAN.

Check it out, and let me know what you think! (Submit bug reports and feature requests to the issue tracker)

by Zachary Deane-Mayer (noreply@blogger.com) at January 16, 2015 02:22 PM

Alstatr

New Toy: SAS® University Edition

So I started using SAS® University Edition which is a FREE version of SAS® software. Again it's FREE, and that's the main reason why I want to relearn the language. The software was announced on March 24, 2014 and the download went available on May of that year. And for that, I salute Dr. Jim Goodnight. At least we can learn SAS® without paying for the expensive price tag, especially for single user like me.

The software requires a virtual machine, where it runs on top of that; and a 64-bit processor. To install, just follow the instruction in this video. Although the installation in the video is done in Windows, it also works on Mac. Below is the screenshot of my SAS® Studio running on Safari.

What's in the box?

The software includes the following libraries:
  1. Base SAS® - Make programming fast and easy with the SAS® programming language, ODS graphics and reporting procedure;
  2. SAS/STAT® - Trust SAS® proven reliability with a wide variety of statistical methods and techniques;
  3. SAS/IML® - Use this matrix programming language for more specialized analyses and data exploration;
  4. SAS Studio - Reduce your programming time with autocomplete for hundreds of SAS® statements and procedures, as well as built-in syntax help;
  5. SAS/ACCESS® - Seamlessly connect with your data, no matter where it resides.
For more about SAS® University Edition please refer to the fact sheet.

If you've been following this blog, I have been promoting free software (R, Python, and C/C++) for analysis, and the introduction of SAS® University Edition will only mean one thing, a new topic to discuss on succeeding posts. So let's welcome this software by doing analysis on it.

Analysis

Our goal here is to address the basics in order to proceed with the analysis, and thus we have the following: 1. Importing and transforming the data; 2. Descriptive statistics; 3. Hypothesis testing: One-sample t test; 4. Creating function; and, 5. Visualization.

Data

We'll use again the Volume of Palay Production (1994 to 2013 quarterly) from Cordillera Administrative Region (CAR) Philippines. To reproduce this article, please click here to download the data.
  1. Importing and transforming the data
    Working in SAS® Studio, requires you to upload your data into it. To do this, hover to the sidebar, click on Folders tab, and there you will find the "up arrow" for upload. See picture below
    You are now set to import the data using the following code. As for my case, the location of the uploaded data seen from the above photo is in "/folders/myfolders/palay.csv",

    In SAS®, proc refers to procedure, where in this case we perform the import procedure. out is the path where the SAS® data is saved, here we saved it in "Work" folder with filename "palay". getnames determines whether to generate SAS® variable names from the data values in the first record of the imported file. Finally, datarow starts reading data from the specified row number in the delimited text file.

    I want to emphasize that the description of the arguments of the statements and procedures above is available in the software itself, thanks to SAS® Studio, autocomplete for hundreds of SAS® statements and procedures is very handy. So that in the proceeding codes, we will give description on selected statements only. Below is the autocomplete feature of SAS® Studio seen in action,
    Now that we have the data in our workspace, let's do some transformation on it. In R, we always start by viewing the head of the data or the first few observations of the data, and we code it as head(data). Having that habit, here's how to do it in SAS®, in this case, first five observations,

    ObsAbraApayaoBenguetIfugaoKalingaMt_Province
    1124329341483300105532675
    24158923542878063352571920
    3178719221955107445446955
    41715214501353619607316872715
    5126623852530331585202601
    If you want to start and end on specific row, you can do the following. In this case, from 5th row to 10th row:

    ObsAbraApayaoBenguetIfugaoKalingaMt_Province
    5126623852530331585202601
    65576745277113134282521242
    792710992796513431069145
    82154017038246314226362382465
    9103913822592684249732624
    10542410588106413828401401237
    Now, what about playing with the variables of the data? Say we want to view a specific column only, assuming observations from row 15 to 20 of the Benguet variable, how is that? Well, I humbly present to you the following code,

    ObsBenguet
    152847
    162942
    172119
    18734
    192302
    202598
    For viewing multiple columns, simply enumerate the name of the variables using either keep -- keeps the variables to be returned, or drop -- drops the variables, excluded in the printing.

    ObsAbraApayaoBenguetIfugaoKalinga
    1510481427284755264402
    16256791566129421445233717
    1710552191211958827352
    18543764617341047724494
    1910291183230264383316
    2023710122222598844626659
    I think above are enough demonstrations for data transformation.
  2. Perform descriptive statistics
    And as always, next step is to look on the descriptive statistics of the data, and here's how to do it,

    VariableNMeanStd DevMinimumMaximum
    Abra
    Apayao
    Benguet
    Ifugao
    Kalinga
    Mt_Province
    79
    79
    79
    79
    79
    79
    12874.38
    16860.65
    3237.39
    12414.62
    30446.42
    4506.20
    16746.47
    15448.15
    1588.54
    5034.28
    22245.71
    3815.71
    927.0000000
    401.0000000
    148.0000000
    1074.00
    2346.00
    382.0000000
    60303.00
    54625.00
    8813.00
    21031.00
    68663.00
    13038.00
    In case you want to view few or more statistics, you can try

    We'll end this section with the following scatter plot matrix,
    A quick analysis, we see a strong positive relationship between Kalinga and Apayao; and relationship between Ifugao and Benguet base on the above scatter plot matrix.
  3. Hypothesis testing: One-sample t test
    Let's perform simple hypothesis testing, the one-sample t test. Using 0.05 level of significance we'll test whether the true mean of Abra is not equal to 15000.

    NMeanStd DevStd ErrMinimumMaximum
    7912874.416746.51884.1927.060303.0
    Mean95% CL MeanStd Dev95% CL Std Dev
    12874.49123.416625.416746.514480.919859.1
    DFt ValuePr > |t|
    78-1.130.2627
    From the above numerical output, we see that the p-value = 0.2627 is greater than $\alpha = 0.05$, hence there is no sufficient evidence to conclude that the average volume of palay production is not equal to 15000. Graphically, the observations of the Abra variable is not normally distributed based on its Q-Q plot, although that is subjective but evidently the points clearly deviates from the line.
  4. Creating a function
    Let's create a function, we'll use the fcmp procedure. For illustration purposes, consider the standard normal function, $$ \phi(x) = \frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{x^2}{2}\right\} $$ In SAS® we code it as follows,

    To generate data from this function using do loop, consider the following:

    Obsxy
    1-5.0.000001487
    2-4.9.000002439
    3-4.8.000003961
    4-4.7.000006370
    5-4.6.000010141
    And that's how you create and use a function in SAS®. For me, the function definition procedure fcmp is the best procedure to be included in SAS® version 9.2, and I'm just lucky relearning this language with this feature available, especially that it is FREE in SAS® Studio.
  5. Visualization
    Now it's time for us to create some visual art. And SAS® being a propriety software, has a lot to offer. We've demonstrate few above already, this time let's plot the data points of sn_data generated from the stdnorm function we define earlier. Here it is,
    For other types of plot, simply go to the Snippets tab in the side bar of the SAS® Studio, and there you will find template codes for different types of plots. See picture below,
    I will end this section with histogram and series plot.
    • Histogram
    • Historical

Conclusion

In conclusion, it wasn't difficult for me to relearn SAS®, not only because I have used it on few papers back in college, but also because I have programming background on R and Python, which I used as basis on understanding the grammar of the language. Overall, SAS® language is a high level language, as we see above, simple statement will give you complete results with graphics without having lengthy code. And although I used R and Python as my primary tools for research, I am happy to include SAS® on it. And despite the popularity of R in analysis, I am looking ahead to see more learners, students, and researchers even more bloggers using SAS®. That way, we can share and get ideas, techniques between communities of R, SAS®, and Python.

What about you? How's your experience with SAS® University Edition?

Data Source

Reference

  1. SAS® Documentation
  2. r4stats.com: Data Import. From http://r4stats.com/examples/data-import/ (acccessed January 15, 2015)
  3. SAS Learning Module: Subsetting data in SAS. From http://www.ats.ucla.edu/stat/sas/modules/subset.htm (accessed January 15, 2015)
.header { background-color: #EDF2F9; border-color: #B0B7BB; border-style: solid; border-width: 0px 1px 1px 0px; color: #127; font-family: Arial,"Albany AMT",Helvetica,Helv; font-size: x-small; font-style: normal; font-weight: bold; padding: 5px 5px 5px 5px; } .rowheader { background-color: #EDF2F9; border-color: #B0B7BB; border-style: solid; border-width: 0px 1px 1px 0px; color: #127; font-family: Arial,"Albany AMT",Helvetica,Helv; font-size: x-small; font-style: normal; font-weight: bold; text-align: center; padding: 5px 5px 5px 5px; } .data, .dataemphasis { background-color: #FFF; border-color: #C1C1C1; border-style: solid; border-width: 0px 1px 1px 0px; font-family: Arial,"Albany AMT",Helvetica,Helv; font-size: x-small; font-style: normal; font-weight: normal; text-align: right; padding: 5px 5px 5px 5px; } .table { border-color: #C1C1C1; border-style: solid; border-width: 1px 1px 1px 1px; border-collapse: collapse; border-spacing: 0px; padding: 5px 5px 5px 5px; margin-bottom: 1em; } .body { color: #000; font-family: Arial,"Albany AMT",Helvetica,Helv; font-size: x-small; font-style: normal; font-weight: normal; line-height: 1.231; }

by Al-Ahmadgaid Asaad (noreply@blogger.com) at January 16, 2015 09:20 AM

CRANberries

New package jagsUI with initial version 1.3.1

Package: jagsUI
Version: 1.3.1
Date: 2015-1-15
Title: A Wrapper Around rjags to Streamline JAGS Analyses
Author: Ken Kellner
Maintainer: Ken Kellner
Depends: R (>= 2.14.0), lattice
Imports: rjags (>= 3-3), coda (>= 0.13), parallel, methods
SystemRequirements: JAGS (http://mcmc-jags.sourceforge.net)
Description: This package provides a set of wrappers around rjags functions to run Bayesian analyses in JAGS (specifically, via libjags). A single function call can control adaptive, burn-in, and sampling MCMC phases, with MCMC chains run in sequence or in parallel. Posterior distributions are automatically summarized (with the ability to exclude some monitored nodes if desired) and functions are available to generate figures based on the posteriors (e.g., predictive check plots, traceplots). Function inputs, argument syntax, and output format are nearly identical to the R2WinBUGS/R2OpenBUGS packages to allow easy switching between MCMC samplers.
License: GPL-2
URL: https://github.com/kenkellner/jagsUI
NeedsCompilation: no
Packaged: 2015-01-15 21:47:50 UTC; kkellner
Repository: CRAN
Date/Publication: 2015-01-16 06:42:38

More information about jagsUI at CRAN

January 16, 2015 05:13 AM

January 15, 2015

Gregor Gorjanc

cpumemlog: Monitor CPU and RAM usage of a process (and its children)

Long time no see ...

Today I pushed the cpumemlog script to GitHub https://github.com/gregorgorjanc/cpumemlog. Read more about this useful utility at the GitHub site.

by Gregor Gorjanc (noreply@blogger.com) at January 15, 2015 11:16 PM

CRANberries

New package lfl with initial version 1.0

Package: lfl
Type: Package
Title: Linguistic Fuzzy Logic
Version: 1.0
Date: 2015-01-14
Author: Michal Burda
Maintainer: Michal Burda
Description: Various algorithms related to linguistic fuzzy logic: mining for linguistic fuzzy association rules, performing perception-based logical deduction (PbLD), and forecasting time-series using fuzzy rule-based ensemble (FRBE).
License: GPL (>= 3.0)
Suggests: testthat
Depends: R (>= 3.1.1)
Imports: Rcpp (>= 0.11.0), foreach, forecast (>= 5.5), plyr, tseries, e1071, zoo, utils
LinkingTo: Rcpp
NeedsCompilation: yes
SystemRequirements: C++11
Packaged: 2015-01-15 13:27:44 UTC; michal
Repository: CRAN
Date/Publication: 2015-01-15 17:04:00

More information about lfl at CRAN

January 15, 2015 03:13 PM

New package MultiMeta with initial version 0.1

Package: MultiMeta
Type: Package
Title: Meta-analysis of Multivariate Genome Wide Association Studies
Version: 0.1
Date: 2014-08-21
Author: Dragana Vuckovic
Maintainer: Dragana Vuckovic
Description: Allows running a meta-analysis of multivariate Genome Wide Association Studies (GWAS) and easily visualizing results through custom plotting functions. The multivariate setting implies that results for each single nucleotide polymorphism (SNP) include several effect sizes (also known as "beta coefficients", one for each trait), as well as related variance values, but also covariance between the betas. The main goal of the package is to provide combined beta coefficients across different cohorts, together with the combined variance/covariance matrix. The method is inverse-variance based, thus each beta is weighted by the inverse of its variance-covariance matrix, before taking the average across all betas. The default options of the main function \code{multi_meta} will work with files obtained from GEMMA multivariate option for GWAS (Zhou & Stephens, 2014). It will work with any other output, as soon as columns are formatted to have the according names. The package also provides several plotting functions for QQ-plots, Manhattan Plots and custom summary plots.
License: GPL (>= 2)
Imports: mvtnorm,expm,ggplot2,reshape2
Depends: gtable,grid
Collate: 'betas_plot.R' 'mhplot.R' 'multi_meta.R' 'qqplotter.R' 'Example_file_1.R' 'Example_file_2.R' 'Example_output_file.R'
Packaged: 2015-01-15 08:57:26 UTC; genetica
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-15 11:44:10

More information about MultiMeta at CRAN

January 15, 2015 11:12 AM

Removed CRANberries

Package PoMoS (with last version 1.1.1) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2011-02-11 1.1.1
2010-10-23 1.1
2010-10-13 1.0

January 15, 2015 05:13 AM

Package PenLNM (with last version 1.0) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2012-11-28 1.0

January 15, 2015 05:13 AM

Package convexHaz (with last version 0.2) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2009-01-29 0.2
2008-10-29 0.1
2008-09-10 0.0

January 15, 2015 05:13 AM

Package gemmR (with last version 1.3-2) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-12-13 1.3-2

January 15, 2015 05:13 AM

Package UScensus2000blkgrp (with last version 0.03) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2010-02-24 0.03

January 15, 2015 05:13 AM

Package bark (with last version 0.1-0) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2008-07-16 0.1-0

January 15, 2015 05:13 AM

Package remMap (with last version 0.1-0) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2009-01-17 0.1-0

January 15, 2015 05:13 AM

January 14, 2015

Removed CRANberries

Package partDSA (with last version 0.9.5) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2014-03-05 0.9.5
2013-09-26 0.8.6
2013-02-06 0.8.5
2012-01-25 0.8.4
2010-04-27 0.7.1
2010-04-09 0.7.0
2009-12-30 0.6.0
2009-05-05 0.5.1

January 14, 2015 09:13 PM

CRANberries

New package mztwinreg with initial version 1.0-1

Package: mztwinreg
Title: Regression Models for Monozygotic Twin Data
Description: Linear and logistic regression models for quantitative genetic analysis of data from monozygotic twins.
Version: 1.0-1
Date: 2015-01-13
Author: Aldo Cordova-Palomera
Maintainer: Aldo Cordova-Palomera
Imports: rms, mclogit
Suggests: lme4
License: GPL-3
URL: https://github.com/AldoCP/mztwinreg
Packaged: 2015-01-14 18:20:24 UTC; Juan StepSon
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-14 19:50:57

More information about mztwinreg at CRAN

January 14, 2015 09:12 PM

New package speaq with initial version 1.2.0

Package: speaq
Type: Package
Title: Tools for Nuclear Magnetic Resonance (NMR) spectrum alignment and quantitative analysis.
Version: 1.2.0
Date: 2015-10-01
Author: Trung Nghia Vu, Kris Laukens and Dirk Valkenborg
Maintainer: Trung Nghia Vu
Description: We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. Related publication is available at http://www.biomedcentral.com/1471-2105/12/405/.
Depends: R (>= 3.0.0), MassSpecWavelet
Imports: graphics,stats
License: Apache License 2.0
URL: https://github.com/nghiavtr/speaq
Packaged: 2015-01-14 12:12:43 UTC; trungvu
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-14 19:33:33

More information about speaq at CRAN

January 14, 2015 07:13 PM

New package betas with initial version 0.1.0

Package: betas
Title: Standardized Beta Coefficients
Version: 0.1.0
Authors@R: person("Andrea", "Cantieni", email = "andrea.cantieni@phsz.ch", role = c("aut", "cre"))
Description: Computes standardized beta coefficients and corresponding standard errors for the following models: - linear regression models with numerical covariates only - linear regression models with numerical and factorial covariates - weighted linear regression models - robust linear regression models with numerical covariates only
Depends: R (>= 3.1.1)
Imports: robust
License: GPL-3
LazyData: true
URL: https://github.com/andreaphsz/betas
BugReports: https://github.com/andreaphsz/betas/issues
Packaged: 2015-01-14 12:25:08 UTC; phsz
Author: Andrea Cantieni [aut, cre]
Maintainer: Andrea Cantieni
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-14 14:09:25

More information about betas at CRAN

January 14, 2015 01:13 PM

New package EFDR with initial version 0.1.0

Package: EFDR
Type: Package
Title: Wavelet-Based Enhanced FDR for Signal Detection in Noisy Images
Version: 0.1.0
Date: 2014-10-22
Authors@R: c(person("Andrew", "Zammit-Mangion", role = c("aut", "cre"), email = "andrewzm@gmail.com"), person("Hsin-Cheng", "Huang", role = "aut"))
Suggests: knitr, ggplot2, RCurl, fields, gridExtra, animation
Description: Enhanced False Discovery Rate (EFDR) is a tool to detect anomalies in an image. The image is first transformed into the wavelet domain in order to decorrelate any noise components, following which the coefficients at each resolution are standardised. Statistical tests (in a multiple hypothesis testing setting) are then carried out to find the anomalies. The power of EFDR exceeds that of standard FDR, which would carry out tests on every wavelet coefficient: EFDR choose which wavelets to test based on a criterion described in Shen et al. (2002). The package also provides elementary tools to interpolate spatially irregular data onto a grid of the required size. The work is based on Shen, X., Huang, H.-C., and Cressie, N. 'Nonparametric hypothesis testing for a spatial signal.' Journal of the American Statistical Association 97.460 (2002): 1122-1140.
Imports: Matrix, foreach (>= 1.4.2), doParallel (>= 1.0.8), waveslim (>= 1.7.5), parallel, gstat (>= 1.0-19), tidyr (>= 0.1.0.9000), dplyr (>= 0.3.0.2), sp (>= 1.0-15)
URL: http://github.com/andrewzm/EFDR
Depends: R (>= 3.0)
VignetteBuilder: knitr
License: GPL (>= 2)
NeedsCompilation: no
Packaged: 2015-01-14 05:30:49 UTC; andrew
Author: Andrew Zammit-Mangion [aut, cre], Hsin-Cheng Huang [aut]
Maintainer: Andrew Zammit-Mangion
Repository: CRAN
Date/Publication: 2015-01-14 07:19:56

More information about EFDR at CRAN

January 14, 2015 07:13 AM

New package WCE with initial version 1.0

Package: WCE
Type: Package
Title: Weighted Cumulative Exposure Models
Version: 1.0
Date: 2015-01-12
Author: Marie-Pierre Sylvestre , Marie-Eve Beauchamp , Ryan Patrick Kyle , Michal Abrahamowicz
Maintainer: Marie-Pierre Sylvestre
Depends: R (>= 2.10)
Imports: plyr, survival, splines
Suggests: R.rsp
VignetteBuilder: R.rsp
Description: WCE implements a flexible method for modeling cumulative effects of time-varying exposures, weighted according to their relative proximity in time, and represented by time-dependent covariates. The current implementation estimates the weight function in the Cox proportional hazards model. The function that assigns weights to doses taken in the past is estimated using cubic regression splines.
License: GPL (>= 2)
LazyData: true
NeedsCompilation: no
Packaged: 2015-01-13 23:52:35 UTC; kyle
Repository: CRAN
Date/Publication: 2015-01-14 01:59:06

More information about WCE at CRAN

January 14, 2015 01:13 AM

New package htmltab with initial version 0.5.0

Package: htmltab
Title: Assemble Data Frames from HTML Tables
Version: 0.5.0
Authors@R: person("Christian", "Rubba", email = "christian.rubba@gmail.com", role = c("aut", "cre"))
Description: htmltab is a package for extracting structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides two major advantages. First, the package automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table. Additionally, the function preprocesses table code, removes unneeded parts and so helps to alleviate the need for tedious post-processing.
Depends: R (>= 3.1.0)
Imports: XML (>= 3.98.1.1)
License: MIT + file LICENSE
LazyData: true
Suggests: testthat, knitr, magrittr (>= 1.5), tidyr
URL: http://github.com/crubba/htmltab
BugReports: https://github.com/crubba/htmltab/issues
VignetteBuilder: knitr
Packaged: 2015-01-13 23:17:15 UTC; christian
Author: Christian Rubba [aut, cre]
Maintainer: Christian Rubba
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2015-01-14 01:51:29

More information about htmltab at CRAN

January 14, 2015 01:13 AM

New package bayesDccGarch with initial version 1.0

Package: bayesDccGarch
Type: Package
Title: The Bayesian Dynamic Conditional Correlation GARCH Model
Version: 1.0
Date: 2014-11-30
Author: Jose A Fioruci , Ricardo S Ehlers , Francisco Louzada
Maintainer: Jose A Fioruci
Depends: R (>= 2.14), coda
Description: Bayesian estimation of DCC-GARCH(1,1) Model
License: GPL (>= 2.14)
URL: http://arxiv.org/abs/1412.2967
Packaged: 2015-01-13 19:07:11 UTC; JAF
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2015-01-14 01:55:33

More information about bayesDccGarch at CRAN

January 14, 2015 01:12 AM