Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Saturday August 27 2016, @10:43PM   Printer-friendly
from the something-to-do-this-weekend dept.

First of all, the question is 'Why Use R'. One source answers that question thus:

R is the leading tool for statistics, data analysis, and machine learning. It is more than a statistical package; it's a programming language, so you can create your own objects, functions, and packages.

Speaking of packages, there are over 2,000 cutting-edge, user-contributed packages available on CRAN (not to mention Bioconductor and Omegahat). Many packages are submitted by prominent members of their respective fields.

[More....]

For Beginners in R, here is a 15 page example based tutorial that covers the basics of R.

  1. Starting R – Trivial tutorial on how to start R for those just wondering what to do next after downloading R.
  2. Assignment Operator – Two important assignment operators in R are <- and =
  3. Listing Objects – All entities in R are called objects. They can be arrays, numbers, strings, functions. This tutorial will cover topics such as listing all objects, listing object from a specific environment and listing objects that satisfy a particular pattern.
  4. Sourcing R File – R code can also be written in a file and then the file can be called from the R code.
  5. Basic Datastructures in R – Understanding data structures is probably the most important part of learning R. This tutorial covers vector and list. It also covers subsetting.
  6. Data Structures in R, Matrix and Array – Covers matrix and vectors. An array is a vector with additional attributes dim which stores the dimension of the array and dimnames which stores the names of the dimensions. A matrix is an 2 dimensional array. Head to the tutorial for examples of both.
  7. Data Structures in R, factors and Data Frame – DataFrames are probably the most widely used data structure. It would help to just go through the examples and practice them. The tutorial covers important operations on the data frame and factors as well as subsetting data frames.
  8. Data Structures in R, Data Frame Operations – Covers some more operations on the data frame; including stack, attach, with, within, transform, subset, reshape and merge
  9. Control Structures in R – The basics of any programming language. Control loops allow looping through data structures. The tutorial covers if, if-else, for, while, next, break, repeat and switch
  10. Control Structures in R – apply – To make looping more efficient R has introduced a family of ‘apply’ functions. For example – the apply function can be used apply a function over specific elements of an array (or matrix). The tutorial covers lapply, sapply, apply, tapply.
  11. Control Structures in R – apply 2 – We continue with some more apply functions – mapply and by.
  12. Functions in R – The nuts and bolts of any programming language. This tutorial not only explains the concept of functions using examples but also covers various scenarios such as anonymous functions or passing functions around.
  13. Printing on Console in R – Printing on console can come very handy. The tutorial covers the print and cat functions as well as printing data frames.
  14. Pretty printing using Format function in R – This tutorial looks at how to use the formatting functions for pretty printing.
  15. Reshape and Reshape2 Package – Once you start working on real life problems in R, a lot of time would be spent on manipulating data. Reshape and Reshape2 package will prove very powerful in converting data to the format required by other libraries. This tutorial has detailed examples to explain the package.

These tutorials are designed for beginners in R, but they can also be used by experienced programmers as a refresher course or as reference. Running loops in R can be slow and therefore the apply group of functions as well as the reshape package can drastically improve the performance of the code.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Interesting) by physicsmajor on Saturday August 27 2016, @10:57PM

    by physicsmajor (1471) on Saturday August 27 2016, @10:57PM (#394057)

    Don't get me wrong, R is a fantastic tool. However, their claims are more than a little overblown.

    R is probably the leading tool for statistics. No argument there.

    Data analysis, though? Strongly disagree. Python is equivalent or superior to R from the standpoint of data structures (Pandas DataFrames for those who like R, NumPy arrays for those who don't, Dask variants of both of these for truly Big Bata, XRay for those who like N-dimensional typed data) which are essentially equivalent or more general and easier to handle than R. The surrounding general purpose scientific tools in the ecosystem are also far ahead on the Python side.

    Machine learning? No. Just no. For classical machine learning there isn't any coherent argument that R is superior to Python with scikit-learn; everyone is using scikit-learn today. For Deep Learning, I don't think R even has a framework to interface with TensorFlow. The only one I could find went through Python.

    I do get that the author is excited about R, and wants to evangelize, and I encourage this. R is a great tool! But it isn't the perfect general purpose science tool that opening headline might make someone think it is. R is a fantastic statistics package with some general science added on. For general science, Python is just better (and can directly interface with R for esoteric statistics when needed). Maybe I'm being overly pedantic, but I do think we should demand honesty up front.

    • (Score: 0) by Anonymous Coward on Sunday August 28 2016, @12:33AM

      by Anonymous Coward on Sunday August 28 2016, @12:33AM (#394064)

      > R is probably the leading tool for statistics.

      Would you say it's more popular and/or better for statistics than MATLAB?

      > Machine learning? No.

      I read on the Web that GNU Octave is better for machine learning than R. Do you agree?

      • (Score: 3, Informative) by physicsmajor on Sunday August 28 2016, @12:39AM

        by physicsmajor (1471) on Sunday August 28 2016, @12:39AM (#394065)

        Matlab is junk for statistics. Absolutely terrible, you might as well be using Excel. Nothing even approaching a DataFrame; R is light-years ahead of Matlab. IGOR Pro is probably the best (though not all that well known) commercial alternative to R for DataFrame-like analysis. For pure stats think JMP or SAS which are full-featured but a pain in the ass to use compared to R.

        Not sure about Octave.

        • (Score: 0) by Anonymous Coward on Sunday August 28 2016, @08:31AM

          by Anonymous Coward on Sunday August 28 2016, @08:31AM (#394136)

          Octave is a clone of Matlab. It's selling point is that it is completely free. So if Matlab sucks, then Octave will suck.

        • (Score: 1) by kanweg on Sunday August 28 2016, @10:24AM

          by kanweg (4737) on Sunday August 28 2016, @10:24AM (#394153)

          I used IGOR many years ago and it was really great. I can only assume that it has not gotten worse.
          Fun thing (don't know if it is still in), you could change the history of what you'd done. To do confirm, you got a dialog box saying: Do you want to change history, with to buttons: Da and Njet.

          Bert

      • (Score: 3, Informative) by cellocgw on Sunday August 28 2016, @04:40PM

        by cellocgw (4190) on Sunday August 28 2016, @04:40PM (#394250)

        > R is probably the leading tool for statistics.

        Would you say it's more popular and/or better for statistics than MATLAB?

        Having used both R and MATLAB quite extensively, I can tell you that R wins hands-down. Here are some of my top reasons (leaving aside the absurd cost of MATLAB)

        -- R's function argument syntax is miles better, more flexible, and more intuitive than MATLAB's .
        -- R understands NAMESPACEs and ENVIRONMENTS; Matlab is clueless.
        -- To load functions in R , one sources them. Everything in a sourced file is available at the command line. None of this "only the top function is available" crap as in MATLAB.
        -- ggplot . 'nuff said
        -- Perfectly legal in R: x = func(y,z)(a,b)[[3]][5:9] . That is, func returns a function, which acts on a&b, returning a structure (list) of which we took the 3rd elemnt and took the 5:9'th elements of it. Matlab won't even let you do x = sin(y)(3:5) .

        -- R allows you to access and modify portions of functions (actually, closures) in real-time, sort of like Mathematica. MATLAB doesn't even let you define a function from the command line.

        R's source code is openly available. Neither MATLAB's engine nor the "builtin" functions' codes are available.

        --
        Physicist, cellist, former OTTer (1190) resume: https://app.box.com/witthoftresume
  • (Score: -1, Redundant) by Anonymous Coward on Sunday August 28 2016, @12:48AM

    by Anonymous Coward on Sunday August 28 2016, @12:48AM (#394066)

    Um, what is "R"?

    [bit early for International Talk Like a Pirate Day, isn't it?]

    • (Score: 3, Informative) by butthurt on Sunday August 28 2016, @01:29AM

      by butthurt (6141) on Sunday August 28 2016, @01:29AM (#394077) Journal

      quoted from Wikipedia: [wikipedia.org]

      R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years.

      R is a GNU package.The source code for the R software environment is written primarily in C, Fortran, and R. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. While R has a command line interface, there are several graphical front-ends available.

    • (Score: 0) by Anonymous Coward on Sunday August 28 2016, @05:24AM

      by Anonymous Coward on Sunday August 28 2016, @05:24AM (#394106)

      "R" a bit like SAS (Statistical Analytical System) which is mega-$$$$$$, but some company execs can't understand 'free' and continue to poke over millions in licensing fees annually for SAS. Consultants look like IBM-ers from the 60's if that gives a clue.

    • (Score: 2) by tangomargarine on Sunday August 28 2016, @07:20AM

      by tangomargarine (667) on Sunday August 28 2016, @07:20AM (#394122)

      It's like C, but +15.

      --
      "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
    • (Score: 2) by janrinok on Sunday August 28 2016, @01:37PM

      by janrinok (52) Subscriber Badge on Sunday August 28 2016, @01:37PM (#394190) Journal

      R is the leading tool for statistics, data analysis, and machine learning. It is more than a statistical package; it's a programming language, so you can create your own objects, functions, and packages.

      I thought I had cleared that up in TFS....

  • (Score: 0) by Anonymous Coward on Sunday August 28 2016, @08:34AM

    by Anonymous Coward on Sunday August 28 2016, @08:34AM (#394137)

    Im curious why R seems to be being promoted with submissions. It's OK. A bit shitty like all the scripted languages... and all non-scripted languages too. But why is someone is trying to pump their favorite niche language here? I know Microsoft has a stake in R, insofar as they bought an R support company. Reason?

    • (Score: 2) by zeigerpuppy on Sunday August 28 2016, @09:32AM

      by zeigerpuppy (1298) on Sunday August 28 2016, @09:32AM (#394149)

      R is the leading language for statistical analysis. It's not exactly niche...
      Having said that it's only dominant in one sector, 'cos that's what it's designed for.

    • (Score: 2) by janrinok on Sunday August 28 2016, @01:40PM

      by janrinok (52) Subscriber Badge on Sunday August 28 2016, @01:40PM (#394193) Journal

      Please, I invite you to make a submission on something relevant to this community which interests you.

      It could be a language, tool, piece of hardware, something you have done, something that you have achieved, something that you have read about or ..... Well, I think you get my drift.

  • (Score: 4, Funny) by ShadowSystems on Sunday August 28 2016, @09:49AM

    by ShadowSystems (6185) <{ShadowSystems} {at} {Gmail.com}> on Sunday August 28 2016, @09:49AM (#394151)

    Because my screen reader can't distinguish between homophones, the article title made me think it was a tutorial in *Augmented Reality*, commonly read as A.R. or "R". So I had to read the article all the way through before I realized it was NOT about AR but about *R* instead.
    Bah! I wanna learn how to code up AR realms to mess with the minds of you Sighted Folks! Upside down trees, sideways sidewalks, buildings that start inside out & Moebious themselves like an MC Escher nightmare, creatures that look like one thing but sound like another, cars that look like Formula-1 racers but sound like farting monkies... Blue grass, green sky, orange water, dancing purple hippos, plaid horses, rainbow sparkly chickens that breathe fire & lay grenades, dogs chased by cats that are themselves chased by hummingbirds with laser nipples...
    AR is the way to go, not merely R!
    *Shakes a palsied fist at the article headline*
    Dang you homophones! Get off'n my AR Lawn!
    *Cough*
    =-)p

    I'll go away now, it's time for my medication again & I've got to hide before they find me!
    *Runs away laughing like a crazed chipmunk*