> Free, Cheap, and Already-Paid-For Statistics Software
>

Working at a university with a large number of researchers I am fortunate. My institution has little choice but to license professional quality statistical software, which can also be used for teaching. However, this is expensive; typically, a site license for enough copies to equip a thirty-student class can cost thousands of dollars - and this may be an annual fee, with the software set to turn itself off if the annual rent is not paid. Schools and colleges, amateurs, and universities in developing countries, may not be able to afford these fees. What are the alternatives?

What I have in mind is a statistical package with the following properties:

I am not at the moment sure whether a package, meeting these specifications completely, exists. Here are my comments after brief examination of some candidates. I got a lot of pointers from this page , but the comments there are brief and do not discuss the utility of the packages for lower-level educational purposes. This is another good free stats software page with lots of links.

Please mail me if you have comments on anything I've mentioned here, or a suggestion for a package that I have missed.

Ordinary scientific calculators are almost useless for statistical calculation. While they will in theory perform mean, standard deviation, and regression, they do nothing to help with exploratory data analysis and there is no way to check for keypunching errors. If (as may happen) you have to teach without at least a graphing calculator on each desk, it would probably be better to keep the arithmetic very simple and have students work by hand.
A graphing calculator has various statistics functions on it, suitable for many elementary purposes, and can typically do scatterplots, histograms, and boxplots. The TI-83 does not permit data sets with missing values to be entered. It does not indicate the scales of statistical (or other) plots. Theoretically it can be programmed to perform simulations (for instance, to determine the robustness of a t-test for exponentially-distributed data) but in practice it's a rare user who can do this. The pedagogical value of running a piece of code on the student's own calculator, that is so opaque that the student must take it completely on faith that it does what it is claimed to, may be dubious.
I'm going to discuss Microsoft's Excel and (the spreadsheet of) Sun's OpenOffice together.
  • OpenOffice is free (and available for Linux as well as Windows). If you have Windows in a school setting then the chances are good that you also have Excel, already paid for.
  • Both are part of closely-knit office packages that make cutting and pasting into a report easy. However, almost any other software supports cut-and-paste too. You don't need to live on the same block to send a letter.
  • Both have a moderately long list of statistical functions; neither does them very well. In particular, good statistical software should handle missing values gracefully though not invisibly. Excel has a reputation for attempting regression with missing values and getting it wrong ; Open Office does not attempt the task and throws an error when there are missing data.
  • Both have a range of graphics designed more for the sales department than for research. OpenOffice has fewer 3D Technicolor Flying-Pyramid Plots With Cheese, but still too many "chart-junk" plots. Neither (astonishingly) does boxplots.
  • Excel is a workplace standard and OpenOffice has a near-identical look and feel (and can be substituted for training purposes - if you can drive a Toyota you can drive a Ford.). This is a good thing for some applications, but not for statistics. Indeed, I fear that the fact that Excel gives some instructors and their students the illusion that it is a usable stats package could result in new generations of Pointy-Haired Bosses who tell their workforce "No, we won't buy MINITAB/Statistica/SPSS/Stata, my B-school stats instructor showed me how to do all that with Excel"
Bottom line - these are both excellent spreadsheets. Neither was designed as a stats package.

HOWEVER, there is an add-on option, SSC-Stat , available from the University of Reading (at no charge to educational/nonprofit users) that seems as if it might be a good option for the die-hard Excel user. I do not know if it will work for Open Office.

INSTAT+ , also available from the University of Reading, is probably one of your best bets. It has a "statsheet" data interface like most commercial stats packages. It is free to academic/noncommercial users, and does most things that one might want to at this level, EXCEPT that its graphics are (as of March 2005) still not all converted to vector graphics: some are "ASCII art". ( Users of pre-Windows versions of MINITAB will get all nostalgic!) I don't know if subsequent versions will change this or not. One possibility would be to use this in combination with another package with strong graphics.

It has special fatures for use in climatology, but these don't get in the way.

R. This is NOT, on its own, a package for beginners, or even for the moderately experienced nonprogrammer. R is a computer language (based closely on "S", a language whose earlier inplematations were not freeware) and while it is phenomenally powerful, you really do have to program in it to get most of that power. That said, it is a professional-quality tool.

HOWEVER, there is now a wrapper called R Commander that allows a beginner to use many features of R via a graphic user interface. It should cover just about everything (except perhaps for some simulations) in a a high school or first-year university stats course via the GUI. It appears to do a good job, also, of discouraging "silly" actions that will produce meaningless results. The user who knows enough to want, correctly, to do such things -- or who has any other more unusual needs -- can work with the script window. A detailed description may be found here

R Commander will open files from MINITAB, SPSS, Excel, and other common packages. The R graphics that it gives easy access to are varied, professional-looking and can be exported in various formats.

Dr Robert Knodt's MODSTAT is not free,but is very cheap ($22 US for an individual license, $67 for an educational site license). It has a huge number of functions including plotting, linear algebra and financial mathematics. It does not appear to have a scripting capability, limiting its utility for simulations.
"Smith's Statistical Package", SSP, is a good little package that does most (possibly not all) of what one might want in a modern introductory course, and is extremely easy to use. It enforces a cases x variables structure in the classical "statsheet" format and will do basic arithmetic on entire columns.
Winstats is a small, clean package that enforces a cases x variables "statsheet" structure. It has an excellent simulations module, The regression functions are a bit hard to find, but are done (OLS, MAD, or robust) as a subfunction of the scatterplot. Sadly, it does not appear to do confidence intervals, though it has a demonstration of how they work. This could be a good classroom package for some, but not all, purposes. Disponible en français.
EstaPlus does most things that one would want to do at the high school level except perhaps for simulations. It's small and neat - worth looking into. Disponible en français.
Statcrunch is a nice, free, Java-based package that can be used online. It's made available by the National Science Foundation's Course Curriculum and Laboratory Improvement Program. It has most of the features one could want except perhaps for simulations, and is clear and easy to use. Its EDA features are strong. Computers must be Web-enabled to use it.
DATAPLOT is "vintage" freeware distributed by the US National Institute of Standards and Technology. It has a rather old-fashioned user interface - rather like command-line BASIC - but is very powerful and not hard to use. (Apparently if you install the Tcl/Tk scripting language you can run it through a GUI as well. I haven't tried this.) A huge number of graphical commands (including the use of color) are supported, but no interactive features like plot brushing or spinning. It has excellent, clear, and professionally-written manuals.
Interesting extras include mathematical functions, matrix manipulation (including eigenvalues, determinants, etc) , some IFS fractal drawing capacity, and primitives for electronic circuit diagrams! Whle this could in principle be used in high schools, it feels more suited to university physics and engineering students.

OpenStat4 (v6 rev 7) This package looks promising - it is naturally organized into "variables" crossed wth "cases", has a lot of good elementary and advanced features, and a sensible choice of graphics. However, I was unable to get it to run reliably; it crashed frequently and the data spreadsheet grew and shrank unpredictably, sometimes hiding already-entered data, and at other times creating unwanted new columns. The help files did not seem to be there. Maybe I've been unlucky with this one; if it does work on your system it could be a very nice piece of kit indeed. Let me know.

ViSta is a wrapper (as described above) on the very powerful but VERY hairy LispStat. It (unlike the core package, in my opinion!) could be used by beginners with a bit of guidance, but the instructor would have to learn a lot to be a reliable guide. It is very visual, very interactive, and encapsulates a whole philosophy of statistical practice, whether you like it or not. I don't think it is very easy for simulations, experimentation, or pottering around, but it's a very fast route to serious exploratory data analysis.

A particularly intriguing idea is the "workmap" . This is a graphical representation of the way in which the data have been organized, analyzed, and examined. It would not surprise me to see this become a standard feature in pedagogical stats software in the 21st century.

The number of different analyses ViSta will do at once can be a bit bewildering and might tempt beginners into "test shopping". Disponible en français.

IDAMS, distributed by UNESCO, does rather a good job of plotting data, including matrixplots and spinning 3D plots. However, its handling of hypothesis testing is limited (which I can live with); and that of confidence intervals seems to be almost nonexistent (not so good). Disponible en français.
Datamology's Visicube is a powerful, easy-to-use visualization program. It does scatterplots, boxplot arrays, slicing, brushing, and other modern EDA/visualization techniques. It does not support any "conclusive" statistical techniques such as hypothesis testing or confidence intervals. It enforces a cases x variables "statsheet" structure.
KyPlot 2.0 is a freeware version of a commercial product (the freeware version probably won't be updated any more.) It has a modern spreadsheet-like look-and-feel, with many of the corresponding strengths and weaknesses. On the plus side it is easy to enter, update, and clean and correct data. On the other hand, the organization into columns is very weak (for instance, it will cheerfully paste a descriptive statistics table into the spreadsheet so that it occupies the empty space at the end of a data column, breaking the "one column, one purpose" rule.) A good selection of basic statistical methods is available.
The selection of graphics is less tacky than that of Excel. However (like Excel and OpenOffice - this seems to be a common vice of spreadsheets!), the package does not do boxplots. (The web page for the commercial version does show a boxplot though it is still not listed as a basic graph type.) Another extraordinary near-omission is that of confidence intervals, which are given only as a pair of numbers on the last two lines of the report on the corresponding hypothesis test. Any instructor who agrees with me that primary emphasis should be put onto interval estimation rather than hypothesis testing will consider this a grave pedagogical problem.
All in all, this is not an ideal classroom package, though it could be used.

PSPP is an open-source implementation of the SPSS language. It is under construction by a team of volunteers, and does not currently support graphics, ANOVA, regression, or various other useful functions. Nonetheless, when it's finished, it ought to be terrific.


Return to Math Links page

Return to Robert Dawson's home page

Return to departmental home page

Return to Saint Mary's home page