Planet R

May 17, 2012

Revolutions

Orbitz: R has become the data-mining tool of choice

Sameer Chopra, vice president of Advanced Analytics at Orbitz Worldwide, wrote recently in Analytics magazine about the changing landscape of processes, software and systems for statistical modelers. In a section on "Big Data and Open Source Analytics", Chopra lays out the reasons why the R language "has become the data-mining tool of choice for machine learners":

  • R has very good integration with Hadoop, an area where established commercial statistical tools have frankly been playing catch-up over the past year. (Note: At the time of this writing, some established statistical solution providers were announcing an access interface to Hadoop.)
  • Many startups and smaller firms do not have deep pockets and are embracing open source tools such as the R programming language and NoSQL database systems such as MongoDB.
  • R is a leading language for developing new statistical methods, and it is a platform for statistical innovation and collaboration across both the corporate world and academia. In my opinion, for the first time in years, the stronghold of established commercial players seems to be potentially threatened; open source tools are better suited for Big Data and will slowly but surely continue to take share away from commercialized statistical packages. In fact, traditional statistical vendors have recognized that R is a force to be reckoned with. In response, many of these vendors have developed hooks into R so users can interface with the R language.
  • Based on the resumes I’ve been reading, the next generation of data miners is flocking to R as their go-to tool. Professors in general are comfortable with R; they tend to use R and Excel as part of their curriculum.
  • In short, open-source analytics tools and platforms have arrived.

Chopra says that the usage of R in the commercial sector is growing "as firms such as Revolution Analytics focus on the enterprise capabilities for R" (for example, Revolution R Enterprise's Hadoop support and enterprise deployment).

Chopra also has some interesting perspectives on statistical modeling vs machine learning which you can find in the full article linked below.

Analytics magazine: The times they are a changin’ for advanced analytics

by David Smith at May 17, 2012 09:30 PM

Where's Waldo? Image Analysis in R

R user Arthur Charpentier attempts to use the raster library and R functions to find Waldo in a "Where's Waldo" image:

Screen Shot 2012-05-17 at 1.47.51 PM

Sadly, it turned out that Waldo was a bit too tricky to spot using these techniques. But Arthur did have more success identifying the US flag in a shot from the Apollo mission, and identifying answers in the form for a multiple-choice test. All of the R code is provided at the link below, so that's a great place to start if you're looking to do some image analysis in R yourself.

Freakonometrics: Finding Waldo, a flag on the moon and multiple choice tests, with R

by David Smith at May 17, 2012 08:55 PM

CRANberries

New package spartan with initial version 1.0

Package: spartan
Type: Package
Title: Spartan (Simulation Parameter Analysis R Toolkit ApplicatioN)
Version: 1.0
Date: 2012-04-12
Author: Kieran Alden, Mark Read, Paul Andrews, Jon Timmis, Henrique Veiga-Fernandes, Mark Coles
Maintainer: Kieran Alden
Description: Computer simulations are becoming a popular technique to use in attempts to further our understanding of complex systems. This package provides code for four techniques described in available literature which aid the analysis of simulation results, at both single and multiple timepoints in the simulation run. The first technique addresses aleatory uncertainty in the system caused through inherent stochasticity, and determines the number of replicate runs necessary to generate a representative result. The second examines how robust asimulation is to parameter perturbation, through the use of a one-at-a-time parameter analysis technique. Thirdly, a latin hypercube based sensitivity analysis technique is included which can elucidate non-linear effects between parameters and indicate implications of epistemic uncertainty with reference to the system being modelled. Finally, a further sensitivity analysis technique, the extended Fourier Amplitude Sampling Test (eFAST) has been included to partition the variance in simulation results between input parameters, to determine the parameters which have a significant effect on simulation behaviour.
Suggests: lhs, gplots
License: GPL-2
Packaged: 2012-05-17 14:47:02 UTC; kieran
Repository: CRAN
Date/Publication: 2012-05-17 15:52:39

More information about spartan at CRAN

May 17, 2012 05:51 PM

New package DescribeDisplay with initial version 0.2.3

Package: DescribeDisplay
Version: 0.2.3
Title: R interface to DescribeDisplay (GGobi plugin)
Author: Hadley Wickham , Di Cook , Andreas Buja , Barret Schloerke
Depends: proto
Imports: grid, reshape2, ggplot2 (>= 0.9.1), plyr, scales, grid
Maintainer: Hadley Wickham
Description: Produce publication quality graphics from output of GGobi's describe display plugin
License: BSD
URL: http://www.ggobi.org/describe-display/
Collate: 'axis-geom.r' 'axis-grob.r' 'data.r' 'DescribeDisplay-package.r' 'ggplot-barchart.r' 'ggplot-parcoords.r' 'ggplot-scatmat.r' 'ggplot-timeseries.r' 'ggplot.r' 'plots.r' 'utils.r'
Packaged: 2012-05-17 15:12:15 UTC; hadley
Repository: CRAN
Date/Publication: 2012-05-17 16:03:21

More information about DescribeDisplay at CRAN

May 17, 2012 05:51 PM

Removed CRANberries

Package inlinedocs (with last version 1.8) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2011-10-24 1.8
2011-05-31 1.6
2010-10-20 1.4
2010-04-12 1.1
2009-09-30 1.0

May 17, 2012 05:51 PM

Package gdistance (with last version 1.1-2) was removed from CRAN

Previous versions (as known to CRANberries) which should be available via the Archive link are:

2011-09-30 1.1-2
2011-01-09 1.1-1

May 17, 2012 05:51 PM

CRANberries

New package googlePublicData with initial version 0.12.05

Package: googlePublicData
Type: Package
Title: An R library to build Google's Public Data Explorer DSPL Metadata files
Version: 0.12.05
Date: 2012-05-15
Author: George Vega Yon
Maintainer: George Vega Yon
Description: package provides a collection of functions to set up Google Public Data Explorer data visualization tool with your own data, building automaticaly the corresponding DSPL (XML) metadata file jointly with the CSV files. All zipped up and ready to be published at Public Data Explorer.
Depends: R (>= 2.11.0), XML, utils, XLConnect
License: GPL (>= 3)
URL: http://code.google.com/p/rdspl/
LazyLoad: yes
Packaged: 2012-05-16 18:07:12 UTC; George
Repository: CRAN
Date/Publication: 2012-05-17 12:55:21

More information about googlePublicData at CRAN

May 17, 2012 01:51 PM

New package simSummary with initial version 0.1.0

Package: simSummary
Title: Simulation summary
Description: simSummary is a small utility package which eases the process of summarizing simulation results. Simulations often produce intermediate results - some focal statistics that need to be summarized over several scenarios and many replications. This step is in principle easy, but tedious. The package simSummary fills this niche by providing a generic way of summarizing the focal statistics of simulations. The useR must provide properly structured input, holding focal statistics, and then the summary step can be performed with one line of code, calling the simSummary function.
Author: Gregor Gorjanc
Maintainer: Gregor Gorjanc
License: GPL (>= 2)
Version: 0.1.0
Depends: abind (>= 1.4-0), svUnit (>= 0.7-5)
Imports: gdata (>= 2.8.0)
Date: Check NEWS file for changes: news(package='simSummary')
Packaged: 2012-05-16 21:33:40 UTC; ggorjan
Repository: CRAN
Date/Publication: 2012-05-17 11:36:45

More information about simSummary at CRAN

May 17, 2012 11:51 AM

New package fanc with initial version 0.17

Package: fanc
Type: Package
Title: Penalized likelihood factor analysis via non-concave penalty
Version: 0.17
Date: 2012-05-17
Author: Kei Hirose, Michio Yamamoto
Maintainer: Kei Hirose
Suggests: RGtk2
Description: This package computes the penalized maximum likelihood estimates of factor loadings and unique variances for various tuning parameters. The pathwise coordinate descent along with EM algorithm is used.
License: GPL (>= 2)
URL: http://www.keihirose.com/.
Packaged: 2012-05-17 01:21:56 UTC; hirosekei
Repository: CRAN
Date/Publication: 2012-05-17 07:55:01

More information about fanc at CRAN

May 17, 2012 09:51 AM

New package MissingDataGUI with initial version 0.1-3

Package: MissingDataGUI
Type: Package
Title: A GUI for Missing Data Exploration
Version: 0.1-3
Date: 2012-05-15
Author: Xiaoyue Cheng, Dianne Cook, Heike Hofmann
Maintainer: Xiaoyue Cheng
Description: This package provides numeric and graphical summaries for the missing values from both discrete and continuous variables. A variety of imputation methods are applied, including univariate imputations like fixed or random values, multiple imputations based on other packages, and imputations conditioned on a categorical variable.
Depends: gWidgetsRGtk2, Hmisc, norm, GGally, ggplot2
Imports: cairoDevice, grid, plyr, reshape2, reshape
License: GPL
Collate: 'MissingDataGUI-package.r' 'MissingDataGUI.r' 'WatchMissingValues.r' 'imputation.r' 'SingleImputation.r' 'zzz.r'
Packaged: 2012-05-17 03:11:05 UTC; xiaoyue
Repository: CRAN
Date/Publication: 2012-05-17 07:00:23

More information about MissingDataGUI at CRAN

May 17, 2012 07:51 AM

New package CUMP with initial version 1.0

Package: CUMP
Type: Package
Title: Analyze Multivariate Phenotypes by Combining Univariate results
Version: 1.0
Date: 2012-03-08
Author: Xuan Liu and Qiong Yang
Maintainer: Xuan Liu
Description: Combining Univariate Association Test Results of Multiple Phenotypes for Detecting Pleiotropy
License: GPL (>= 2)
LazyLoad: yes
Packaged: 2012-05-16 18:49:39 UTC; liuxuan
Repository: CRAN
Date/Publication: 2012-05-17 06:23:23

More information about CUMP at CRAN

May 17, 2012 07:51 AM

New package bisectr with initial version 0.0.2

Package: bisectr
Title: Tools to find bad commits with git bisect
Version: 0.0.2
Author: Winston Chang
Maintainer: Winston Chang
Description: Tools to find bad commits with git bisect
Depends: R (>= 2.14)
Imports: devtools
License: GPL-2
Collate: 'bisect.r'
Packaged: 2012-05-16 19:06:34 UTC; winston
Repository: CRAN
Date/Publication: 2012-05-17 06:18:53

More information about bisectr at CRAN

May 17, 2012 07:51 AM

Journal of Statistical Software

bayesclust: An R Package for Testing and Searching for Significant Clusters

Vol. 47, Issue 14, May 2012

Abstract:

The detection and determination of clusters has been of special interest among researchers from different fields for a long time. In particular, assessing whether the clusters are significant is a question that has been asked by a number of experimenters. In Fuentes and Casella (2009), the authors put forth a new methodology for analyzing clusters. It tests the hypothesis H0 : κ = 1 versus H1 : κ = k in a Bayesian setting, where κ denotes the number of clusters in a population. The bayesclust package implements this approach in R. Here we give an overview of the algorithm and a detailed description of the functions available in the package. The routines in bayesclust allow the user to test for the existence of clusters, and then pick out optimal partitionings of the data. We demonstrate the testing procedure with simulated datasets.

May 17, 2012 07:00 AM

An Exact Algorithm for Weighted-Mean Trimmed Regions in Any Dimension

Vol. 47, Issue 13, May 2012

Abstract:

Trimmed regions are a powerful tool of multivariate data analysis. They describe a probability distribution in Euclidean d-space regarding location, dispersion, and shape, and they order multivariate data with respect to their centrality. Dyckerhoff and Mosler (2011) have introduced the class of weighted-mean trimmed regions, which possess attrac- tive properties regarding continuity, subadditivity, and monotonicity.
We present an exact algorithm to compute the weighted-mean trimmed regions of a given data cloud in arbitrary dimension d. These trimmed regions are convex polytopes in Rd. To calculate them, the algorithm builds on methods from computational geometry. A characterization of a region’s facets is used, and information about the adjacency of the facets is extracted from the data. A key problem consists in ordering the facets. It is solved by the introduction of a tree-based order, by which the whole surface can be traversed efficiently with the minimal number of computations. The algorithm has been programmed in C++ and is available as the R package WMTregions.

May 17, 2012 07:00 AM

tclust: An R Package for a Trimming Approach to Cluster Analysis

Vol. 47, Issue 12, May 2012

Abstract:

Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust model-based clustering approaches. With this in mind, an R package for performing non-hierarchical robust clustering, called tclust, is presented here. Instead of trying to “fit” noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for.

May 17, 2012 07:00 AM

Causal Inference Using Graphical Models with the R Package pcalg

Vol. 47, Issue 11, May 2012

Abstract:

The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data. In this document, we give a brief overview of the methodology, and demonstrate the package’s functionality in both toy examples and applications.

May 17, 2012 07:00 AM

Classification Trees for Ordinal Responses in R: The rpartScore Package

Vol. 47, Issue 10, May 2012

Abstract:

This paper introduces rpartScore (Galimberti, Soffritti, and Di Maso 2012), a new R package for building classification trees for ordinal responses, that can be employed whenever a set of scores is assigned to the ordered categories of the response. This package has been created to overcome some problems that produced unexpected results from the package rpartOrdinal (Archer 2010). Explanations for the causes of these unexpected results are provided. The main functionalities of rpartScore are described, and its use is illustrated through some examples.

May 17, 2012 07:00 AM

May 16, 2012

Dirk Eddelbuettel

RProtoBuf 0.2.4

A new release 0.2.4 of RProtoBuf is now on CRAN. RProtoBuf provides GNU R bindings for the Google Protobuf data encoding library used and released by Google.

This release once again contains a number of patches kindly contributed by Murray Stokely, as well as an added header file needed to build with the g++ 4.7 version which has become the build standard on CRAN.

The NEWS file entry follows below:

0.2.4   2012-05-15

    o   Applied several patches kindly supplied by Murray Stokely to
         - properly work with repeated strings 
         - correct C++ function naming in a few instances
         - add an example of ascii export/import of messages

    o   Suppport g++-4.7 and stricter #include file checking by adding unistd

    o   Made small improvements to the startup code

CRANberries also provides a diff to the previous release 0.2.3. More information is at the RProtoBuf page which has a draft package vignette, a 'quick' overview vignette and a unit test summary vignette. Questions, comments etc should go to the rprotobuf mailing list off the RProtoBuf page at R-Forge. Updated to show NEWS rather than ChangeLog

May 16, 2012 05:00 PM

Revolutions

Revolution Newsletter: May 2012

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full May edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email.

New R Training Courses Announced. Three new R courses from leading R experts are now available for registration:

  • An Introduction to R for SAS, SPSS, and Stata Users will be presented by Bob Muenchen (author of R for SAS and SPSS users) June 26-29. This is an on-line workshop with live instruction that you can attend from your desk - no travel necessary!
  • Visualization in R with ggplot2 is also a new web-based course with live instruction from Garrett Grolemund & Dr. Winston Chang of Rice University, presented online June 19-20.
  • R Development Master Class will be presented in-person by R package author and professor Hadley Wickham, June 21-22 in NYC and June 28-29 in Redwood City, CA.

Sign up to these courses now to secure your seat -- attendance is limited!

On the hunt for a cure for MS. Researchers at SUNY Buffalo are using big-data analytics and generic data toresearch a cure for Multiple Sclerosis. Read in Forbes how Revolution R and IBM Netezza speeds up their research, or join the #IBMDataChat discussion on Twitter at noon ET on Thursday, May 10 to learn more.

More free R webinars. Revolution Analytics' Spring webinar series continues, with upcoming presentations on Getting Up to Speed with R,  spatial statistics with Rdata mining with R, Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR, and a look at the next version of Revolution R Enterprise.

Catch our archived webinar replays. Missed any of the recent presentations on Big Data with R and Hadoopintegrating R with BI tools and MS Office, or How Big Data is Changing Retail Marketing Analytics? Check our webinar archives for streaming replays and presentation downloads. 

Putting the R in Analytics. feature article in Information Age reveals how new graduates are driving adoption of R in industry.

Upcoming Conferences for R Users. Revolution Analytics is proud to sponsor: R/Finance 2012 (the premier conference for R users in the finance industry) on May 11-12 in Chicago; and the Interface 2012 conference on the future of Statistical Computing, May 16-18 in Houston.

R user conference deadlines approaching. If you're planning to attend useR! 2012 in Nashville but want to avoid late registration fees, the deadline for regular registration is May 12. 

Revolution Analytics: Newsletter Archive

by David Smith at May 16, 2012 04:38 PM

CRANberries

New package igraphdata with initial version 0.1

Package: igraphdata
Version: 0.1
Date: 2012-05-11
Title: A collection of network data sets for the igraph package
Author: Gabor Csardi
Maintainer: Gabor Csardi
Depends: R (>= 2.10)
Suggests: igraph0, igraph
Description: A small collection of various network data sets, to use with the igraph package. They also work with the igraph0 package.
License: GPL (>= 2) + file LICENSE
URL: http://igraph.sourceforge.net
BugReports: http://bugs.launchpad.net/igraph
Packaged: 2012-05-15 04:00:29 UTC; gaborcsardi
Repository: CRAN
Date/Publication: 2012-05-16 13:43:43

More information about igraphdata at CRAN

May 16, 2012 01:51 PM

New package primer with initial version 1.0

Package: primer
Type: Package
Title: Functions and data for A Primer of Ecology with R
Version: 1.0
Date: 2012-05-16
Author: M Henry H Stevens
Maintainer: Hank Stevens
Depends: deSolve, lattice
Suggests: bbmle, gdata, nlme, vegan
Description: Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.
License: GPL
LazyLoad: yes
Packaged: 2012-05-16 11:17:29 UTC; mhhs
Repository: CRAN
Date/Publication: 2012-05-16 11:49:09

More information about primer at CRAN

May 16, 2012 11:51 AM

New package fishmove with initial version 0.0-1

Package: fishmove
Type: Package
Title: Prediction of Fish Movement Parameters
Version: 0.0-1
Packaged: 2012-05-16 08:12:10 UTC; Johannes Radinger
Date/Publication: 2012-05-16 10:48:44
Author: Johannes Radinger
Maintainer: Johannes Radinger
Description: Functions to predict fish movement parameters based on multiple regression and plotting leptokurtic fish dispersal kernels
License: GPL (>= 2)
Depends: ggplot2, plyr
LazyLoad: yes
LazyData: yes
URL:
Repository: CRAN

More information about fishmove at CRAN

May 16, 2012 11:51 AM

May 15, 2012

Revolutions

How long before R overtakes SAS and SPSS?

Based on an analysis of Google Scholar data on usage of statistical software, Bob Muenchen makes a forecast: R will overtake SAS and SPSS in 2015. Forecasting is extrapolation — always a tricky business — so Bob also provides these qualitative reasons why R will continue to grow at the expense of SAS and SPSS:

  • The continued rapid growth in add-on packages (Figure 10)
  • The attraction of R’s powerful language
  • The near monopoly R has on the latest analytic methods
  • Its free price
  • The freedom to teach with real-world examples from outside organizations, which is forbidden to academics by SAS and SPSS licenses (it benefits those organizations, so the vendors say they should have their own software license).

See how Bob comes up with this forecast (using R, of course!) at the link below.

r4stats.com: Will 2015 be the Beginning of the End for SAS and SPSS?

by David Smith at May 15, 2012 10:37 PM

Bioconductor Project Working Papers

A Systematic Selection Method for the Development of Cancer Staging Systems

The tumor-node-metastasis (TNM) staging system has been the anchor of cancer diagnosis, treatment, and prognosis for many years. For meaningful clinical use, an orderly, progressive condensation of the T and N categories into an overall staging system needs to be defined, usually with respect to a time-to-event outcome. This can be considered as a cutpoint selection problem for a censored response partitioned with respect to two ordered categorical covariates and their interaction. The aim is to select the best grouping of the TN categories. A novel bootstrap cutpoint/model selection method is proposed for this task by maximizing bootstrap estimates of the chosen statistical criteria. The criteria are based on prognostic ability including a landmark measure of the explained variation, the area under the ROC curve, and a concordance probability generalized from Harrell's c-index. We illustrate the utility of our method by applying it to the staging of colorectal cancer.

by Yunzhi Lin et al. at May 15, 2012 05:14 PM

CRANberries

New package MetaDE with initial version 1.0

Package: MetaDE
Type: Package
Title: Meta analysis of multiple microarray data
Version: 1.0
Date: 2012-03-27
Author: Jia Li and Xingbin Wang
Maintainer: Jia Li and Xingbin Wang
Depends: survival, impute,Biobase,combinat,tools
Description: A collection of functions for conducting genomic meta-analysis in R.
License: GPL-2
LazyLoad: yes
Packaged: 2012-05-12 13:22:33 UTC; xingbin
Repository: CRAN
Date/Publication: 2012-05-15 15:35:38

More information about MetaDE at CRAN

May 15, 2012 03:51 PM

Dirk Eddelbuettel

RcppSMC 0.1.1

CRAN now tests packages against g++-4.7 (as this version has become the default on Debian's testing variant. This compiler switch once again triggered a set of build failures, mostly from include files now deemed missing. For RcppSMC, it came down to a five-character patch of explicitly stating one max() call as std::max()

No other changes were made at this point. The NEWS entry is below:

0.1.1   2012-05-14

    o   Version 0.1.1 

    o   Minor g++-4.7 build fix of using std::max() explicitly

Courtesy of CRANberries, there is also a diffstat report for 0.1.1 relative to 0.1.0 As always, more detailed information is on the RcppSMC page,

May 15, 2012 01:10 PM

CRANberries

New package ri with initial version 0.9

Package: ri
Type: Package
Title: ri: R package for performing randomization-based inference for experiments
Version: 0.9
Date: 2012-05-10
Author: Peter M. Aronow and Cyrus Samii
Maintainer: Cyrus Samii
Description: This package provides a set of tools for conducting exact or approximate randomization-based inference for experiments of arbitrary design. The primary functionality of the package is in the generation, manipulation and use of permutation matrices implied by given experimental designs. Among other features, the package facilitates estimation of average treatment effects, constant effects variance estimation, randomization inference for significance testing against sharp null hypotheses and visualization of data and results.
License: GPL (>= 2)
Packaged: 2012-05-14 16:15:27 UTC; peteraronow
Repository: CRAN
Date/Publication: 2012-05-15 06:44:09

More information about ri at CRAN

May 15, 2012 07:52 AM

New package lava.tobit with initial version 0.4-6

Package: lava.tobit
Type: Package
Title: LVM with censored and binary outcomes
Version: 0.4-6
Date: 2012-04-24
Author: Klaus K. Holst
Maintainer: Klaus K. Holst
Description: lava plugin allowing combinations of left and right censored and binary outcomes
Keywords: twin model, structural equation model
Depends: R (>= 2.8.0), lava (>= 0.9-15), mvtnorm, survival
License: GPL-3
LazyLoad: yes
Packaged: 2012-05-13 13:52:05 UTC; klaus
Repository: CRAN
Date/Publication: 2012-05-15 06:38:12

More information about lava.tobit at CRAN

May 15, 2012 07:51 AM

New package dma with initial version 1.1

Package: dma
Type: Package
Title: Dynamic model averaging
Version: 1.1
Date: 2012-05-13
Author: Tyler H. McCormick, Adrian Raftery, David Madigan
Maintainer: Tyler H. McCormick
Description: Dynamic model averaging for binary and continuous outcomes.
Suggests: MASS, mnormt
License: GPL-2
LazyLoad: yes
Packaged: 2012-05-15 04:27:28 UTC; tylermccormick
Repository: CRAN
Date/Publication: 2012-05-15 06:38:01

More information about dma at CRAN

May 15, 2012 07:51 AM

May 14, 2012

Revolutions

Multiple Sclerosis Tweet-Chat: Review

We had a great Twitter conversation last Thursday on the use of big-data analytics, Revolution R Enterprise, and IBM Netezza in the search for a cure for MS. Many thanks to the other panelists: Murali Ramanathan (SUNY Buffalo), Tim Coetzee (National MS Society) and moderator Shawn Dolley (IBM) for fielding and answering questions from interested parties following #IBMDataChat. As you can see from this twitteR analysis, it was a lively discussion, with more than 300 tweets during the designated hour:

Tweetchat

IBM's James Kobielus has a summary of the chat, highlighting some of the key nuggets of information. For example, Dr Ramanathan revealed that this research is helping to understand interactions between genetic factors and environment that could help identify lifestyle and diet changes to manage MS. 

Also, in this interview with IBM's Mike Kearney I explain why the R language is uniquely suited for research like this, and how Revolution R works with the massive data volumes required for this study.

Thinking Inside the Box: Are you ready for big data? R is.

by David Smith at May 14, 2012 10:57 PM

New courses from R gurus

Looking to learn R, or to expand your R skills for data visualization or package development? Here are some R courses presented by the experts you may be interested in:

June 19-20Visualization in R with ggplot2. This course presented by Garrett Grolemund & Dr. Winston Chang of Rice University is also a web-based course with live presentation. This course provides instruction on data visualization with R, including data transformation, visualization of Big Data and polishing graphics for presentation.  

June 21-22 (in New York City) and June 28-29 (in Redwood City, CA): R Development Master Class is an in-person, in-depth R course presented R package author and professor Hadley Wickham. This two-day course offers expert instruction in R programming and package development, and is ideal for anyone looking to hone their R development skills with expert instruction.

June 26-29: An Introduction to R for SAS, SPSS, and Stata Users. This course will be presented by Bob Muenchen, author of R for SAS and SPSS users and R for Stata Users. This course will let you build on your existing statistical software skills to learn the R language -- all without leaving your desk. The course features live presentation and student/teacher interaction via a web-based course delivery system.

For more information about these courses (including pricing and registration), please follow the link below.

Revolution Analytics: Public Training Courses

by David Smith at May 14, 2012 10:27 PM

CRANberries

New package mets with initial version 0.1-8

Package: mets
Type: Package
Title: Analysis of Multivariate Event Times
Version: 0.1-8
Date: 2012-05-13
Author: Klaus K. Holst and Thomas Scheike
Maintainer: Klaus K. Holst
Description: Implementation of various statistical models for multivariate event history data. Including multivariate cumulative incidence models, and bivariate random effects probit models (Liability models)
License: GPL (>= 2)
LazyLoad: yes
URL: http://r-forge.r-project.org/projects/lava/
Depends: Rcpp (>= 0.9.2), RcppArmadillo (>= 0.2.17), lava, mvtnorm, numDeriv, splines, timereg, ucminf, prodlim
LinkingTo: Rcpp, RcppArmadillo
Collate: 'biprobit.R' 'biprobit.strata.R' 'bpnd.R' 'bptwin.R' 'claytonakes.R' 'coef.biprobit.R' 'cumh.R' 'Dbvn.R' 'fastapprox.R' 'ipw.R' 'logLik.biprobit.R' 'mets-packages.R' 'npc.R' 'onload.R' 'plot.bptwin.R' 'plotcr.R' 'print.biprobit.R' 'print.summary.biprobit.R' 'score.biprobit.R' 'sim.bptwin.R' 'sim.clayton.oakes.R' 'summary.biprobit.R' 'summary.bptwin.R' 'uniprobit.R' 'utils.R' 'vcov.biprobit.R' 'bicomprisk.R' 'event.R' 'procformula.R' 'concordance-scripts.R' 'cor.R' 'twinlm.R' 'twinsim.R' 'methodstwinlm.R' 'sim-nordic-twin.R'
Packaged: 2012-05-14 16:33:21 UTC; klaus
Repository: CRAN
Date/Publication: 2012-05-14 18:20:32

More information about mets at CRAN

May 14, 2012 07:51 PM

New package lava with initial version 1.0-5

Package: lava
Type: Package
Title: Linear Latent Variable Models
Version: 1.0-5
Date: 2012-05-13
Author: Klaus K. Holst
Maintainer: Klaus K. Holst
Description: Estimation and simulation of latent variable models
Keywords: latent variable model, structural equation model, likelihood inference
Depends: R (>= 2.10.0), graph, mvtnorm, numDeriv
Suggests: Rgraphviz, igraph, Matrix, gof (>= 0.8), foreach, survival
License: GPL-3
LazyLoad: yes
Collate: 'addattr.R' 'addvar.R' 'baptize.R' 'bootstrap.R' 'cancel.R' 'children.R' 'cluster.hook.R' 'coef.R' 'compare.R' 'confint.R' 'constrain.R' 'copy.R' 'correlation.R' 'covariance.R' 'deriv.R' 'distribution.R' 'effects.R' 'endogenous.R' 'estimate.multigroup.R' 'estimate.R' 'exogenous.R' 'finalize.R' 'fix.R' 'formula.R' 'functional.R' 'glmest.R' 'gof.R' 'graph2lvm.R' 'graph.R' 'heavytail.R' 'hooks.R' 'index.sem.R' 'information.R' 'iv.R' 'kill.R' 'labels.R' 'latent.R' 'lava-package.R' 'lisrel.R' 'logLik.R' 'logo.R' 'lvm.R' 'makemissing.R' 'manifest.R' 'matrices.R' 'measurement.R' 'merge.R' 'missingMLE.R' 'model.frame.R' 'modelPar.R' 'model.R' 'modelsearch.R' 'modelVar.R' 'moments.R' 'multigroup.R' 'nested.R' 'nodecolor.R' 'Objective.R' 'onload.R' 'optims.R' 'parameter.R' 'parpos.R' 'pars.R' 'partialcor.R' 'path.R' 'plot.R' 'predict.R' 'print.R' 'profile.R' 'randomslope.R' 'regression.R' 'reorder.R' 'residuals.R' 'score.R' 'sim.R' 'startvalues.R' 'subgraph.R' 'subset.R' 'summary.R' 'survival.R' 'utils.R' 'variances.R' 'vars.R' 'vcov.R' 'weight.R' 'equivalence.R'
Packaged: 2012-05-14 16:33:18 UTC; klaus
Repository: CRAN
Date/Publication: 2012-05-14 18:20:30

More information about lava at CRAN

May 14, 2012 07:51 PM

New package SPSL with initial version 0.1-5

Package: SPSL
Type: Package
Version: 0.1-5
Date: 2012-05-14
Title: Site Percolation on Square Lattice (SPSL)
Author: Pavel V. Moskalev
Maintainer: Pavel V. Moskalev
Description: SPSL package provides functionality for labeling of percolation cluster on 2D & 3D square lattice with various lattice size, relative fraction of accessible sites (occupation probability), iso- & anisotropy, von Neumann & Moore d-neighborhood
Depends: R (>= 2.14.0)
Suggests: lattice
License: GPL-3
LazyLoad: yes
URL: http://www.r-project.org
Packaged: 2012-05-14 06:14:21 UTC; paule
Repository: CRAN
Date/Publication: 2012-05-14 15:58:38

More information about SPSL at CRAN

May 14, 2012 05:51 PM

New package cumplyr with initial version 0.1-1

Package: cumplyr
Type: Package
Title: Extends ddply to allow calculation of cumulative quantities.
Version: 0.1-1
Date: 2012-05-02
Author: John Myles White
Maintainer: John Myles White
Description: Extends ddply to allow calculation of cumulative quantities.
License: MIT
Packaged: 2012-05-13 02:46:31 UTC; johnmyleswhite
Repository: CRAN
Date/Publication: 2012-05-14 15:58:39

More information about cumplyr at CRAN

May 14, 2012 05:51 PM

New package tm.plugin.factiva with initial version 1.0

Package: tm.plugin.factiva
Type: Package
Title: A plug-in for the tm text mining framework to import articles from Factiva
Version: 1.0
Date: 2012-05-14
Author@R: person("Milan", "Bouchet-Valat", email="nalimilan@club.fr", role=c("aut", "cre"))
Author: Milan Bouchet-Valat
Maintainer: Milan Bouchet-Valat
Enhances: tm (>= 0.5)
Imports: tm (>= 0.5), XML
Description: This package provides a tm Source to create corpora from articles exported from the Dow Jones Factiva content provider as XML files.
License: GPL (>= 2)
URL: https://r-forge.r-project.org/projects/rcmdr-tms/
BugReports: https://r-forge.r-project.org/tracker/?group_id=1179
Packaged: 2012-05-14 13:54:33 UTC; milan
Repository: CRAN
Date/Publication: 2012-05-14 15:04:26

More information about tm.plugin.factiva at CRAN

May 14, 2012 03:51 PM

New package pgnorm with initial version 1.1

Package: pgnorm
Type: Package
Title: The p-generalized normal distribution
Version: 1.1
Date: 2012-05-10
Author: Steve Kalke
Maintainer:
Description: Evaluation of the pdf and the cdf of the univariate p-generalized normal distribution. Sampling from the p-generalized normal distribution using either the p-generalized polar method, the p-generalized rejecting polar method, the Monty Python method, the Ziggurat method or the method of Nardon and Pianca. The package also includes routines for the simulation of the bivariate, p-generalized uniform distribution on the p-generalized unit circle and the simulation of the corresponding angular distribution.
License: GPL (>= 2)
LazyLoad: yes
Packaged: 2012-05-14 07:14:44 UTC; skadmin
Repository: CRAN
Date/Publication: 2012-05-14 09:43:11

More information about pgnorm at CRAN

May 14, 2012 09:51 AM

May 13, 2012

CRANberries

New package BCEA with initial version 1.0

Package: BCEA
Type: Package
Title: Bayesian Cost Effectiveness Analysis
Version: 1.0
Date: 2012-04-27
Author: Gianluca Baio
Maintainer: Gianluca Baio
Suggests: MASS
Description: Produces an economic evaluation of a Bayesian model in the form of MCMC simulations. Given suitable variables of cost and effectiveness / utility for two or more interventions, BCEA computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis
License: GPL (>= 2)
Packaged: 2012-05-13 10:23:37 UTC; gianluca
Repository: CRAN
Date/Publication: 2012-05-13 14:49:30

More information about BCEA at CRAN

May 13, 2012 03:51 PM

New package TextRegression with initial version 0.1-3

Package: TextRegression
Type: Package
Title: Predict continuous valued outputs associated with text documents.
Version: 0.1-3
Date: 2012-05-12
Author: John Myles White
Maintainer: John Myles White
Description: Predict continuous valued outputs associated with text documents. The input corpus of text documents is transformed into a document-term matrix (DTM) and then a regularized linear regression is fit that uses this matrix as predictors to predict the continuous valued output. The corpus's terms, coefficients for all terms and an estimate of the model's predictive power are returned in a list.
License: Artistic-2.0
LazyLoad: yes
Suggests: testthat
Depends: tm, Matrix, glmnet, plyr
Collate: 'dtm.to.Matrix.R' 'help.R' 'regress.text.R'
Packaged: 2012-05-13 03:24:03 UTC; johnmyleswhite
Repository: CRAN
Date/Publication: 2012-05-13 08:35:21

More information about TextRegression at CRAN

May 13, 2012 09:51 AM

May 12, 2012

CRANberries

New package SightabilityModel with initial version 1.0

Package: SightabilityModel
Type: Package
Title: Wildlife Sightability Modeling
Version: 1.0
Date: 2012-05-12
Author: John Fieberg
Maintainer: John Fieberg
Description: Uses logistic regression to model the probability of detection as a function of covariates. This model is then used with observational survey data to estimate population size, while accounting for uncertain detection. See Steinhorst and Samuel (1989).
License: GPL-2
LazyLoad: yes
Packaged: 2012-05-11 19:03:23 UTC; Jofieber
Repository: CRAN
Date/Publication: 2012-05-12 15:13:56

More information about SightabilityModel at CRAN

May 12, 2012 03:51 PM

New package polywog with initial version 0.1-0

Package: polywog
Title: Bootstrapped Basis Regression with Oracle Model Selection
Version: 0.1-0
Date: 2012-05-11
Author: Brenton Kenkel and Curtis S. Signorino
Maintainer: Brenton Kenkel
Description: Routines for flexible functional form estimation via basis regression, with model selection via the adaptive LASSO or SCAD to prevent overfitting.
License: GPL (>= 2)
Depends: glmnet (>= 1.5.1), ncvreg, Formula, Matrix
Imports: stringr, games, car
Suggests: foreach, lattice, rgl
Collate: 'helpers.r' 'fn_pred.r' 'fn_plot.r' 'fn_poly.r' 'polywog.r'
Packaged: 2012-05-11 20:58:30 UTC; brenton
Repository: CRAN
Date/Publication: 2012-05-12 15:38:30

More information about polywog at CRAN

May 12, 2012 03:51 PM

Simon Jackman

Thought of the day re Political Science at NSF

The fight to save the American Community Survey crowds out the voices trying to keep NSF funding for political science…

by jackman at May 12, 2012 12:05 AM

May 11, 2012

Revolutions

Because it's Friday: Australian PSAs from the 80s

When I was a kid growing up in Australia, it seemed like every commercial break during the Saturday morning cartoon's or after-school shows was punctuated by some PSA encouraging us to lead a healthier life. These "community service announcements" were government-sponsored, and often paired a low-budget animations with a catchy jingle. Strangely enough, lots of Australians (me included) remember them fondly, and can still recite the songs on demand. Here are a few of my favourites:

"Slip Slop Slap" made avoiding skin cancer fun (an important lesson in the Sunburnt Country):

 

There were a whole series of these "Life. Be in it." ads, encouraging us all to get out of the house and do something lest we end up like Norm. (Seems like these ads are undergoing a revival.) I'm pretty sure there was one in the series encouraging Australians to have "one alcohol free day a week", but I'm either misremembering or it's not on YouTube.

 

But by a long shot, my absolute favourite was the Vitamins Song:

 

I always hoped there'd be a sequel where the Slip Slop Slap Magpie and Vitamin D resolved their differences in a Battle Royale, but sadly it never came to be. Enjoy your weekend!

 

by David Smith at May 11, 2012 08:13 PM

Mariano Rivera’s baseball prowess, illustrated with R

Kevin Quealy, graphics editor at the New York Times, has published another fascinating behind-the-scenes look at how the Times creates data visualizations for print and online. In his latest post, he looks at how a visualization of the Yankee's Mariano Rivera performance compared to other Major League Baseball pitchers was created. (Detail below, click for the full image.) 

Rivera-NYT-detail
The infographic began its life as a hand-drawn sketch, that begat a line-chart created using R (based on data scraped from the Web). The R chart was was then cleaned up and annotated using Adobe Illustrator for publication. One interesting detail of the process: the source R graph is deliberately created using garish colours (purples, greens, etc.) to make the color-selection process easier in Illustrator.

Check out the ChartsNThings archive for other great studies of the process of data journalism. Many of the case studies involve the use of R code, such as these visualizations of visitors to the White HouseSantorum's primary supportSantorum/Romney exit poll dataNFL players mentioned on ESPN, the defense budget and the richest 1%.

ChartsNThings: Sketches: How Mariano Rivera Compares to Baseball’s Best Closers

by David Smith at May 11, 2012 06:58 PM

CRANberries

New package SKAT with initial version 0.75

Package: SKAT
Type: Package
Title: SNP-set (Sequence) Kernel Association Test
Version: 0.75
Date: 2011-05-11
Author: Seunggeun Lee, Larisa Miropolsky and Micheal Wu
Maintainer: Seunggeun (Shawn) Lee
Description: Kernel based SNP set test
License: GPL (>= 2)
Depends: R (>= 2.13.0)
Packaged: 2012-05-11 15:13:23 UTC; seunggeun
Repository: CRAN
Date/Publication: 2012-05-11 17:45:50

More information about SKAT at CRAN

May 11, 2012 05:51 PM

Revolutions

CRANberries

New package RTDAmeritrade with initial version 0.0.1

Package: RTDAmeritrade
Version: 0.0.1
Date: 2011-06-4
Title: RTDAmeritrade
Author: Theodore Van Rooy
Maintainer: Theodore Van Rooy
Depends: R (>= 2.9.1), xts, zoo, XML, RCurl, sfsmisc
Suggests: snow
Description: The package contains functions that can be used to interface with the TDAmeritrade API.
License: GPL (>= 2)
Packaged: 2012-05-10 22:31:23 UTC; greentheo
Repository: CRAN
Date/Publication: 2012-05-11 05:13:46

More information about RTDAmeritrade at CRAN

May 11, 2012 05:51 AM

May 10, 2012

Revolutions

In case you missed it: April 2012 Roundup

In case you missed them, here are some articles from April of particular interest to R users.

Information Age published a feature article on R, describing how new graduates are driving adoption of R in industry.

Bob Muenchen has updated his list of R package equivalents to SAS and SPSS procedures.

A history of Data Science, including Bill Cleveland's 2001 paper.

Researchers at SUNY Buffalo are using Revolution R and IBM Netezza with genetic data to research a cure for Multiple Sclerosis, and the story has been reported in Forbes, eWeek and other media.

Pairach Piboonrungroj has compiled a list of 20 free R tutorials from around the world.

The annual Rmetrics financial engineering workshop takes place in Switzerland, June 24-28.

An elegant solution to a pairs-of-squares sequence puzzle, based on graph theory.

An example of using R to build a recommendation engine, and ranking the most popular movies from the million row movie dataset.

When is Big Data useful for statistical analysis? Norman Nie provides five examples in the Sybase Capital Markets Guide.

Revolution Analytics' Spring webinar series is underway, with presentations on Big Data with R and Hadoop, integrating R with MS Office, spatial statistics with R, data mining with R and retail marketing analytics.

The US National Oceanic and Atmospheric Administration uses R to forecast river flooding events.

R continues its growth in academia (as measured by Google Scholar citations); SPSS and SAS see steep declines.

A fantastic animation of 18th-century sailing ship voyages, created with R.

R and other open source tools used at the Consumer Financial Protection Bureau.

An introduction to the new Julia language, and a comparison with R.

SAP's HANA in-memory datastore provides integration with R.

Saraj Gupta has written an in-depth article on how the internals of R's name lookup mechanism works.

LityxIQ uses R functions glm, MASS, rpart, nnet and rjson for their online marketing analytics and optimization application.

Google can graph 2-variable and 3-variable equations.

Other non-R-related stories in the past month included: an animation of world ocean currents, how English sounds to Italians, creating the Pharoah's Serpent effect with mercury thiocynate, and a unique performance of "Somebody that I Used to Know".

There are new R user groups in Milan and Cologne. Meeting times for local R user groups can be found on the updated R Community Calendar.

As always, thanks for the comments and please send any suggestions to me at david@revolutionanalytics.com. Don't forget you can follow the blog using an RSS reader like Google Reader, or by following me on Twitter (I'm @revodavid). You can find roundups of previous months here.

by David Smith at May 10, 2012 11:36 PM

EU court's SAS ruling conflicts with Oracle v Google

In a blow to SAS's efforts to litigate competitor and low-cost SAS clone WPS out of existence, the European Union High Court has ruled that programming languages can't be copyrighted. SAS Institute (Cary, NC) had claimed that the WPS software — which allows users to process SAS data files and SAS "data step" scripts without SAS software — breached SAS's copyright in its re-implementation of SAS functionality. But given that WPS did not study SAS source code, and merely reimplemented its interfaces and behaviour based on observation, the court ruled that this was not a violation. In a press release following the decision, the court stated:

 ... neither the functionality of a computer program nor the programming language and the format of data files used in a computer program in order to exploit certain of its functions constitute a form of expression. Accordingly, they do not enjoy copyright protection.  

(Emphasis in press release.) SAS has issued no comment on the ruling to date.

The case has implications beyond SAS and WPS. In Europe, at least, it implies that developers of both proprietary and open-source software have the right to duplicate the functionality and exposed interfaces (in this case, the structure of SAS procedure calls) of a proprietary language provided they don't copy the source code of the implementation itself.

The EU ruling conflicts with that of the jury in the Google v Oracle trial, where it was decided that Google's re-implementation of Java APIs in the Android OS did infringe Oracle's copyright. Implementing a SAS procedure and a Java API are similar programming efforts, and the EFF has argued convincingly that preventing programmers from re-implementing APIs harms innovation. Nonetheless, the US-based jury did not decide if Google's Android implementation was a fair use infringement of copyright; if that question were to be resolved in Google's failure the outcome would have a similar practical effect as the EU decision.

JDSupra: European Union Court Rules that Software Functions Cannot Be Copyrighted 

 

 

 

by David Smith at May 10, 2012 10:32 PM

CRANberries

New package rolasized with initial version 1.0

Package: rolasized
Type: Package
Title: Solarized colours in R
Version: 1.0
Date: 2012-05-03
Author: Christian Zang
Maintainer: Christian Zang
Depends: base
Description: Ethan Schoonover's solarized colour scheme for R (http://ethanschoonover.com/solarized)
License: MIT
LazyLoad: yes
Collate: 'rolasized.R'
Packaged: 2012-05-10 06:50:48 UTC; christian
Repository: CRAN
Date/Publication: 2012-05-10 09:48:37

More information about rolasized at CRAN

May 10, 2012 09:51 AM

New package p2distance with initial version 1.0.1

Package: p2distance
Type: Package
Title: Welfare's Synthetic Indicator
Version: 1.0.1
Date: 2012-05-2
Author: A.J. Perez-Luque; R. Moreno; R. Perez-Perez and F.J. Bonet
Maintainer: A.J. Perez-Luque , R. Perez-Perez
Description: The welfare's synthetic indicator provides an ideal tool for measuring multi-dimensional concepts such as welfare, development, living standards, etc. It enables information from the various indicators to be aggregated into a single synthetic measure.
License: GPL
LazyLoad: yes
Packaged: 2012-05-09 17:28:56 UTC; rperez
Repository: CRAN
Date/Publication: 2012-05-10 09:48:36

More information about p2distance at CRAN

May 10, 2012 09:51 AM

Simon Jackman

House votes to cut NSF political science funding

At 11.57pm tonight the House passed an amendment to HR 5236 cutting NSF funding to political science, 218-208. 5 Dems voted Aye. 27 Reps voted Nay.

Of 3 amendments moved today by Jeff Flake on HR 5326, this one got up. We just learned a lot about preferences for $9M/yr of political science funding vs cutting billions from NSF funding in the aggregate.

At this point we’re hoping the Senate and conference turns this around. Write your Senators…

See my earlier post today on this.

Oh: and they freaking well wiped out the American Community Survey too with this rollcall. Amendment of Daniel Webster (Rep FL-8).

by jackman at May 10, 2012 06:45 AM

CRANberries

New package marqLevAlg with initial version 1.0

Package: marqLevAlg
Type: Package
Title: An algorithm for least-squares curve fitting
Version: 1.0
Date: 2011-09-29
Author: D. Commenges , M. Prague and Amadou Diakite
Maintainer: Melanie Prague
Depends: R (>= 2.0.0)
LazyLoad: yes
Description: This algorithm provides a numerical solution to the problem of minimizing a function. This is more efficient than the Gauss-Newton-like algorithm when starting from points vey far from the final minimum. A new convergence test is implemented (RDM) in addition to the usual stopping criterion : stopping rule is when the gradients are small enough in the parameters metric (GH-1G).
License: GPL (>= 2.0)
URL: http://www.r-project.org
Encoding: latin1
Repository: CRAN
Packaged: 2012-05-09 10:10:12 UTC; ad6
Date/Publication: 2012-05-10 04:31:21

More information about marqLevAlg at CRAN

May 10, 2012 05:51 AM

New package BVS with initial version 4.12.0

Package: BVS
Type: Package
Title: Bayesian Variant Selection: Bayesian Model Uncertainty Techniques for Genetic Association Studies
Author: Melanie Quintana
Maintainer: Melanie Quintana
Description: The functions in this package focus on analyzing case-control association studies involving a group of genetic variants. In particular, we are interested in modeling the outcome variable as a function of a multivariate genetic profile using Bayesian model uncertainty and variable selection techniques. The package incorporates functions to analyze data sets involving common variants as well as extensions to model rare variants via the Bayesian Risk Index (BRI) as well as haplotypes. Finally, the package also allows the incorporation of external biological information to inform the marginal inclusion probabilities via the iBMU.
Version: 4.12.0
License: Unlimited
Depends: MASS, msm, haplo.stats, R (>= 2.14.0)
Packaged: 2012-05-09 16:00:55 UTC; maw27
Repository: CRAN
Date/Publication: 2012-05-10 04:14:52

More information about BVS at CRAN

May 10, 2012 05:51 AM

New package bigml with initial version 0.1-1

Package: bigml
Type: Package
Title: R bindings for the BigML API
Version: 0.1-1
Date: 2012-04-30
Authors@R: c(person("Justin", "Donaldson", role=c("aut","cre"), email = "donaldson@bigml.com"))
Description: The bigml package contains bindings for the BigML API. The package includes methods that provide straightforward access to basic API functionality, as well as methods that accommodate idiomatic R datatypes and concepts.
License: LGPL-3
Depends: RJSONIO, RCurl, plyr
Collate: 'bigml-internal.R' 'formEncodeURL.R' 'bigml-package.R' 'createDataset.R' 'createModel.R' 'createPrediction.R' 'createSource.R' 'getDataset.R' 'getModel.R' 'getPrediction.R' 'getSource.R' 'listDatasets.R' 'listModels.R' 'listSources.R' 'quickDataset.R' 'quickModel.R' 'quickPrediction.R' 'quickSource.R' 'setCredentials.R' 'deleteResource.R'
Packaged: 2012-05-09 19:09:29 UTC; justindonaldson
Author: Justin Donaldson [aut, cre]
Maintainer: Justin Donaldson
Repository: CRAN
Date/Publication: 2012-05-10 04:14:54

More information about bigml at CRAN

May 10, 2012 05:51 AM

May 09, 2012

Revolutions

See R integrated with QlikView, Jaspersoft, Excel, and mobile apps

In yesterday's webinar, Revolution Analytics CTO David Champagne demonstrated how to integrate statistical graphics and analytic computations created using R software with a variety of third-party applications. In each case Revolution R Enterprise Server is running as a compute server to the client application, with R scripts launched on each user interaction via the RevoDeployR Web Services API. David demonstrated five examples of such integration (watch the demo by clicking on the links below, and switch to full-screen for easier viewing).

David showed three different ways of delivering interactive sales forecasts in client applications:

  • An HTML 5 application that offers sales forecasts based on a user-selected history from product sales data. This type of interface would be suited to mobile devices like an Android smartphone or iPad. 
  • An enhanced QlikView report, including a sales forecast from patio furniture sales data, with options for the type of forecast model and with data visualization from both R and QlikView.

David also showed two examples where the end-user has even more control over the parameters of the analytic computation:

  • custom interactive web-application written in JavaScript, to perform market basket analysis on retail transaction data. The distribution of purchases is rendered on an interactive map, and products commonly purchased together (within a selected sub-region) are displayed as a tree chart and as association rules. 
  • A Microsoft Excel spreadsheet, with a custom toolbar button that displays a regression dialog, with the regression results (from R) embedded directly in the same spreadsheet the source data was taken from.

The great thing about integrating R into client applications in this way is that:

  • The R computation is run on-demand, and with the results presented in context to the end user;
  • The end user doesn't need to know R (in fact, they're probably don't even know R is involved), while still providing some control over the computation done in R.
  • R doesn't need to be installed on the client device. In fact, Revolution R Enterprise will most likely be running on a dedicated remote server, in a data center or even in the cloud.

You can see David's entire presentation (including details about the client/server architecture and the Web Services API that makes all of this possible) in the webinar replay below, or by downloading the webinar slides.

 

Revolution Analytics webinars: Calling All Data Scientists and Web Developers! Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply Their Value

by David Smith at May 09, 2012 11:42 PM

Simon Jackman

another threat to NSF…

Email arrived overnight with news that the National Science Foundation’s support of political science research might be under threat. Specifically,

APSA [the American Political Science Association] has learned that Representative Jeff Flake (AZ) may imminently introduce an amendment to the NSF appropriations bill now on the House floor (HR 5326: the Commerce, Justice, Science, and Related Agencies Appropriations Act, 2013) to defund the political science program at the NSF.

Flake is running for the AZ Senate seat opening up with Kyl’s retirement. The Republican primary in AZ is in September. Flake is drawing some opposition on his right flank, it would seem. Draw your own conclusions, perhaps sprinkle in the fact that Republican Senate aspirants find themselves in a “post-Lugar” environment, etc.

For what it is worth, Flake has a slightly more conservative voting history than we’d expect, even given that he’s in a seat that went for McCain over Obama by better than 60-40. Take out the McCain home-state effect and the “normal vote” for AZ-6 slides a little more Democratic, meaning that Flake is even a little conservative again, given the district’s presidential vote split. When you look at Flake in relation to other AZ House members (solid dots on the graph above), he’s roughly middle of the pack. This leads a little bit of credence to the “conservative, but conservative enough for the statewide Republican primary constituency?” hypothesis, and, in turn, why we’ve got some position-taking like what’re seeing.

Yesterday Flake got a vote up on the floor of the House, an amendment that he described as taking the NSF budget back to pre-stimulus levels (Congressional Record). It failed, with all Dems voting against it, but with Republicans splitting 121-112 in favor.

The roll call split Republicans pretty cleanly, with legislators’ ideal points a reasonable but not great predictor of Republican votes (area under the ROC curve is 0.945 for everyone, down to 0.858 for Republicans). The next graph shows the item-characteristic curve for the roll call, with ideal points and actual Yeas and Nays superimposed. There are plenty of mis-classifications: 22 Ayes predicted to vote Nay (prob .5) and 25 Nays predicted to vote Yea (prob > .5), for a total of 20% mis-classified. So there is a bit more going on here than “ideology” among Republicans.

Note also that Coburn moved to kill political science funding at NSF back in 2009. I blogged on that at the time. See here (n.b., 5 Senate Dems supported Coburn’s amendment).

by jackman at May 09, 2012 05:50 PM

CRANberries

New package qtutils with initial version 0.1-2

Package: qtutils
Version: 0.1-2
Title: Miscellaneous Qt-based utilities
Author: Deepayan Sarkar
Depends: R (>= 2.14.0), qtbase
Suggests: xtable
Imports: qtbase
LinkingTo: qtbase
Maintainer: Deepayan Sarkar
Description: Miscellaneous Qt-based tools for R
URL: http://qtinterfaces.r-forge.r-project.org
License: LGPL (>= 2)
Packaged: 2012-05-09 13:15:56 UTC; deepayan
Repository: CRAN
Date/Publication: 2012-05-09 13:25:44

More information about qtutils at CRAN

May 09, 2012 01:52 PM

New package TCC with initial version 0.2

Package: TCC
Version: 0.2
Date: 2012-05-07
Title: TCC: tag count comparison package
Author: Koji Kadota, Tomoaki Nishiyama, Kentaro Shimizu
Maintainer: Tomoaki Nishiyama
Description: This package provides normalization method of tag count data by TMM-bayseq-TMM pipeline. Expected application includes RNA-seq, SAGE, ChIP-seq and a like.
Depends: R (>= 2.14.0), edgeR (>= 2.4.6), baySeq, NBPSeq
License: GPL-2
Copyright: Authors listed above
Packaged: 2012-05-08 14:12:28 UTC; tomoaki
Repository: CRAN
Date/Publication: 2012-05-09 04:06:52

More information about TCC at CRAN

May 09, 2012 07:51 AM

New package RCassandra with initial version 0.1-0

Package: RCassandra
Version: 0.1-0
Title: R/Cassandra interface
Author: Simon Urbanek
Maintainer: Simon Urbanek
Description: This packages provides a direct interface (without the use of Java) to the most basic functionality of Apache Cassanda such as login, updates and queries.
License: GPL-2
URL: http://www.rforge.net/RCassandra
Packaged: 2012-05-05 14:20:29 UTC; svnuser
Repository: CRAN
Date/Publication: 2012-05-09 04:06:50

More information about RCassandra at CRAN

May 09, 2012 07:51 AM