SPSS Statistics

 View Only

SPSS Statistics and R

By Archive User posted Tue January 13, 2009 06:56 PM

  


On January 6, the open source statistical language R made the New York Times

http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=1&scp=2&sq=R&st=cse.



The Times said,



For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets.



And,



“I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”




SPSS has a different attitude.




Starting with Version 16, SPSS offers a free plug-in that lets users run R code within SPSS having full access to the active SPSS Statistics data, and writing its output to the SPSS Statistics Viewer. With Version 17, we began creating dialog box interfaces and SPSS-style syntax for R packages we thought would be interesting to SPSS users. You can see the current list in the SPSS Developer Central Downloads section. We use the same tools for this that we make available to any user, so the R connection is completely open.



 



We see R as complementary more than competitive to SPSS Statistics. R is a powerful and flexible programming language for statistics, and there are many, many procedures (packages of functions) for statistics and graphics in it. But it is not an easy language to learn. It helps to know the C language, but it's still a substantial effort. (Bob Muenchen's new book R for SAS and SPSS Users from Springer is a good text.)



We see the SPSS-R connection as a way for users to take advantage of the large number of R packages without the pain part of R. And with the ability to create SPSS pivot tables in the Viewer from R programs, you can get good looking output. Since R is limited to in-memory data, SPSS can select out the data needed for an R analysis and thereby reduce the memory requirement.



As a programming language, R takes a different attitude to communication with the user than SPSS. Running a certain package in R often produces this error message.



Error in optim(rho, f, control = control, hessian = TRUE, method = "BFGS") :

initial value in 'vmmin' is not finite



That is exactly what a programmer needs to know, but it leaves the user clueless. Here is what you see in SPSS using our integration of that package.




Error: SPSSINC HETCOR command was unable to compute the correlations due to data conditions.

This is usually due to some variables being too far from a bivariate normal distribution.




This may leave the programmer clueless, but it's a big hint for the user.



By bringing R and SPSS together, you get the best of both worlds: a large collection of statistical and graphical tools from R with the ease of use, data handling, and output presentation of SPSS Statistics. And, by the way, there are already some user contributions of R package integrations with SPSS available on Developer Central. If you create others of general interest, we invite you to contribute them, too.








#extensions
#Programmability
#rstats
#SPSSStatistics
0 comments
66 views

Permalink