Here’s a quite succinct yet comprehensive summary of machine learning algorithms produced by Jason Brownlee of Machine Learning Mastery. It includes a visual of the algorithms — though it does have a bit of a flavour of phishing for email addresses which is needed to download the graphic. The one below can be found without providing an email address but not as good quality.

A summary of the annual survey of tools and attitudes around data science conducted by Karl Rexer was released at Predictive Analytics World in Boston recently. The full report is expected to be available on in the next couple of months.

Screenshot-Rexer Data Science Survey Highlights Sep-2015.pdf - Adobe Reader

For primary tool usage:

#1 — 36.2% — R
#2 —   7.0% — SAS
#3 —   6.6% — IBM SPSS Modeler
#4 —   6.5% — KNIME (free version)
#5 (tie) — 5.1% — IBM SPSS Statistics
#5 (tie) — 5.1% — STATISTICA
#7 —   3.1% — SAS Enterpirse Miner
#8 —   2.8%  — RapidMiner (free version)
#9 —   2.7% — Weka
#10 — 2.3% — MATLAB

I saw a demo of a package for Rapid and Pretty Things in R earlier in the year when it was a work in progress. It is now live on GitHub (but not yet CRAN). It allows you to very quickly visualise data in R using a Shiny GUI to generate ggplot2 underneath. A nice app for some visual analytics.


Screenshot-raptR - Mozilla Firefox

This is a nice example of the power of multiple APIs working together to deliver a solution.

The app uses R’s Shiny to control a map built using the open source JavaScript Leaflet based on public data displaying map tiles generated by  Stamen Design on Open Street Map data.

Thanks to colleague and R guru Hugh Parsonage for pointing to this one on twitter


Screenshot-Mozilla Firefox-1

I’ve had a few enquiries lately on the relationship between Rattle and the WebFOCUS predictive analytics component called RStat from Information Builders (WebFOCUS is their widely used Business Intelligence suite).

I even had an approach recently from an Australian company offering to provide a demonstration of RStat to see if it might be something we could use for our Data Mining.

Yes, RStat is a fork of Rattle. We developed the initial fork through Togaware back in 2008 or thereabouts. It is an extension to Rattle that allows direct integration into WebFOCUS. Those familiar with Rattle will recognise the following plots generated from RStat.

A WebFOCUS Business Intelligence user can launch RStat as an integrated component of the suite. That is a nice option and users can seamlessly deploy the power of R to do their predictive analytics.

One of the first benefits of the integration is that it allows data to be imported directly and seamlessly into RStat from WebFOCUS. The significance of this is that WebFOCUS has an admirable collection of data import modules and so data from pretty much any source can be integrated into WebFOCUS and thus into RStat (Rattle).

The other significant addition found in RStat is the ability to directly export R models as C code into WebFOCUS. These then become compilable C code modules and hence are first class WebFOCUS objects. These objects are then deployed within the WebFOCUS environment. Anyone familiar with the challenge of deploying R models (or Python models) into a production environment will recognise the significance of this functionality – models are automatically deployable into production without the recoding requirements we as Data Scientists often face.

Information Builders now maintain RStat. Also have a look at the RStat Fact Sheet for details.