Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Saturday August 27 2016, @10:43PM   Printer-friendly
from the something-to-do-this-weekend dept.

First of all, the question is 'Why Use R'. One source answers that question thus:

R is the leading tool for statistics, data analysis, and machine learning. It is more than a statistical package; it's a programming language, so you can create your own objects, functions, and packages.

Speaking of packages, there are over 2,000 cutting-edge, user-contributed packages available on CRAN (not to mention Bioconductor and Omegahat). Many packages are submitted by prominent members of their respective fields.

[More....]

For Beginners in R, here is a 15 page example based tutorial that covers the basics of R.

  1. Starting R – Trivial tutorial on how to start R for those just wondering what to do next after downloading R.
  2. Assignment Operator – Two important assignment operators in R are <- and =
  3. Listing Objects – All entities in R are called objects. They can be arrays, numbers, strings, functions. This tutorial will cover topics such as listing all objects, listing object from a specific environment and listing objects that satisfy a particular pattern.
  4. Sourcing R File – R code can also be written in a file and then the file can be called from the R code.
  5. Basic Datastructures in R – Understanding data structures is probably the most important part of learning R. This tutorial covers vector and list. It also covers subsetting.
  6. Data Structures in R, Matrix and Array – Covers matrix and vectors. An array is a vector with additional attributes dim which stores the dimension of the array and dimnames which stores the names of the dimensions. A matrix is an 2 dimensional array. Head to the tutorial for examples of both.
  7. Data Structures in R, factors and Data Frame – DataFrames are probably the most widely used data structure. It would help to just go through the examples and practice them. The tutorial covers important operations on the data frame and factors as well as subsetting data frames.
  8. Data Structures in R, Data Frame Operations – Covers some more operations on the data frame; including stack, attach, with, within, transform, subset, reshape and merge
  9. Control Structures in R – The basics of any programming language. Control loops allow looping through data structures. The tutorial covers if, if-else, for, while, next, break, repeat and switch
  10. Control Structures in R – apply – To make looping more efficient R has introduced a family of ‘apply’ functions. For example – the apply function can be used apply a function over specific elements of an array (or matrix). The tutorial covers lapply, sapply, apply, tapply.
  11. Control Structures in R – apply 2 – We continue with some more apply functions – mapply and by.
  12. Functions in R – The nuts and bolts of any programming language. This tutorial not only explains the concept of functions using examples but also covers various scenarios such as anonymous functions or passing functions around.
  13. Printing on Console in R – Printing on console can come very handy. The tutorial covers the print and cat functions as well as printing data frames.
  14. Pretty printing using Format function in R – This tutorial looks at how to use the formatting functions for pretty printing.
  15. Reshape and Reshape2 Package – Once you start working on real life problems in R, a lot of time would be spent on manipulating data. Reshape and Reshape2 package will prove very powerful in converting data to the format required by other libraries. This tutorial has detailed examples to explain the package.

These tutorials are designed for beginners in R, but they can also be used by experienced programmers as a refresher course or as reference. Running loops in R can be slow and therefore the apply group of functions as well as the reshape package can drastically improve the performance of the code.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Interesting) by physicsmajor on Saturday August 27 2016, @10:57PM

    by physicsmajor (1471) on Saturday August 27 2016, @10:57PM (#394057)

    Don't get me wrong, R is a fantastic tool. However, their claims are more than a little overblown.

    R is probably the leading tool for statistics. No argument there.

    Data analysis, though? Strongly disagree. Python is equivalent or superior to R from the standpoint of data structures (Pandas DataFrames for those who like R, NumPy arrays for those who don't, Dask variants of both of these for truly Big Bata, XRay for those who like N-dimensional typed data) which are essentially equivalent or more general and easier to handle than R. The surrounding general purpose scientific tools in the ecosystem are also far ahead on the Python side.

    Machine learning? No. Just no. For classical machine learning there isn't any coherent argument that R is superior to Python with scikit-learn; everyone is using scikit-learn today. For Deep Learning, I don't think R even has a framework to interface with TensorFlow. The only one I could find went through Python.

    I do get that the author is excited about R, and wants to evangelize, and I encourage this. R is a great tool! But it isn't the perfect general purpose science tool that opening headline might make someone think it is. R is a fantastic statistics package with some general science added on. For general science, Python is just better (and can directly interface with R for esoteric statistics when needed). Maybe I'm being overly pedantic, but I do think we should demand honesty up front.

    Starting Score:    1  point
    Moderation   +3  
       Interesting=3, Total=3
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 0) by Anonymous Coward on Sunday August 28 2016, @12:33AM

    by Anonymous Coward on Sunday August 28 2016, @12:33AM (#394064)

    > R is probably the leading tool for statistics.

    Would you say it's more popular and/or better for statistics than MATLAB?

    > Machine learning? No.

    I read on the Web that GNU Octave is better for machine learning than R. Do you agree?

    • (Score: 3, Informative) by physicsmajor on Sunday August 28 2016, @12:39AM

      by physicsmajor (1471) on Sunday August 28 2016, @12:39AM (#394065)

      Matlab is junk for statistics. Absolutely terrible, you might as well be using Excel. Nothing even approaching a DataFrame; R is light-years ahead of Matlab. IGOR Pro is probably the best (though not all that well known) commercial alternative to R for DataFrame-like analysis. For pure stats think JMP or SAS which are full-featured but a pain in the ass to use compared to R.

      Not sure about Octave.

      • (Score: 0) by Anonymous Coward on Sunday August 28 2016, @08:31AM

        by Anonymous Coward on Sunday August 28 2016, @08:31AM (#394136)

        Octave is a clone of Matlab. It's selling point is that it is completely free. So if Matlab sucks, then Octave will suck.

      • (Score: 1) by kanweg on Sunday August 28 2016, @10:24AM

        by kanweg (4737) on Sunday August 28 2016, @10:24AM (#394153)

        I used IGOR many years ago and it was really great. I can only assume that it has not gotten worse.
        Fun thing (don't know if it is still in), you could change the history of what you'd done. To do confirm, you got a dialog box saying: Do you want to change history, with to buttons: Da and Njet.

        Bert

    • (Score: 3, Informative) by cellocgw on Sunday August 28 2016, @04:40PM

      by cellocgw (4190) on Sunday August 28 2016, @04:40PM (#394250)

      > R is probably the leading tool for statistics.

      Would you say it's more popular and/or better for statistics than MATLAB?

      Having used both R and MATLAB quite extensively, I can tell you that R wins hands-down. Here are some of my top reasons (leaving aside the absurd cost of MATLAB)

      -- R's function argument syntax is miles better, more flexible, and more intuitive than MATLAB's .
      -- R understands NAMESPACEs and ENVIRONMENTS; Matlab is clueless.
      -- To load functions in R , one sources them. Everything in a sourced file is available at the command line. None of this "only the top function is available" crap as in MATLAB.
      -- ggplot . 'nuff said
      -- Perfectly legal in R: x = func(y,z)(a,b)[[3]][5:9] . That is, func returns a function, which acts on a&b, returning a structure (list) of which we took the 3rd elemnt and took the 5:9'th elements of it. Matlab won't even let you do x = sin(y)(3:5) .

      -- R allows you to access and modify portions of functions (actually, closures) in real-time, sort of like Mathematica. MATLAB doesn't even let you define a function from the command line.

      R's source code is openly available. Neither MATLAB's engine nor the "builtin" functions' codes are available.

      --
      Physicist, cellist, former OTTer (1190) resume: https://app.box.com/witthoftresume