Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by janrinok on Thursday September 19 2019, @03:09PM   Printer-friendly
from the perl-one-liners dept.

Back in May, writer Jun Wu told in her blog how Perl excels at text manipulation. She often uses it to tidy data sets, a necessity as data is often collected with variations and cleaning it up before use is a necessity. She goes through many one-liners which help make that easy.

Having old reliables is my key to success. Ever since I learned Perl during the dot com bubble, I knew that I was forever beholden to its powers to transform.

You heard me. Freedom is the word here with Perl.

When I'm coding freely at home on my fun data science project, I rely on it to clean up my data.

In the real world, data is often collected with loads of variations. Unless you are using someone's "clean" dataset, you better learn to clean that data real fast.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Insightful) by Anonymous Coward on Thursday September 19 2019, @03:14PM (10 children)

    by Anonymous Coward on Thursday September 19 2019, @03:14PM (#896107)

    R would be better than this. Probably python too.

    Starting Score:    0  points
    Moderation   +1  
       Troll=1, Insightful=1, Underrated=1, Total=3
    Extra 'Insightful' Modifier   0  

    Total Score:   1  
  • (Score: 4, Insightful) by Anonymous Coward on Thursday September 19 2019, @03:32PM (8 children)

    by Anonymous Coward on Thursday September 19 2019, @03:32PM (#896119)

    Python is better than being fucked up the ass, but not much else. It's a terrible language designed by people who don't understand programming very well. Syntactic white space, uninformative error messages and dynamic typing being a few transgressions.

    • (Score: 3, Insightful) by Anonymous Coward on Thursday September 19 2019, @07:17PM (7 children)

      by Anonymous Coward on Thursday September 19 2019, @07:17PM (#896215)

      Python is better than being fucked up the ass, unless you're into that sort of thing but not much else.

      There. FTFY.

      • (Score: 0) by Anonymous Coward on Thursday September 19 2019, @09:37PM (6 children)

        by Anonymous Coward on Thursday September 19 2019, @09:37PM (#896269)

        First an eyeroll, but then...yeah. Actually kind of funny! Guess trying to avoid this kind of accidental entendre is an alternative reason to adhere to the "code of conduct"...

        • (Score: 0) by Anonymous Coward on Thursday September 19 2019, @09:46PM (5 children)

          by Anonymous Coward on Thursday September 19 2019, @09:46PM (#896270)

          First an eyeroll, but then...yeah. Actually kind of funny! Guess trying to avoid this kind of accidental entendre is an alternative reason to adhere to the "code of conduct"...

          And to which "code of conduct" might you be referring, friend?

          • (Score: 2) by The Mighty Buzzard on Friday September 20 2019, @01:38AM (4 children)

            by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Friday September 20 2019, @01:38AM (#896338) Homepage Journal

            Certainly not ours [github.com]...

            --
            My rights don't end where your fear begins.
            • (Score: 2) by edIII on Friday September 20 2019, @03:03AM (2 children)

              by edIII (791) on Friday September 20 2019, @03:03AM (#896360)

              Using spaces instead of tabs for indentation.

              Why is this bad? I convert tabs to spaces because of inconsistent treatment of tabs amongst programs while spaces plus a monospaced font are.

              --
              Technically, lunchtime is at any moment. It's just a wave function.
              • (Score: 4, Funny) by The Mighty Buzzard on Friday September 20 2019, @03:29AM

                by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Friday September 20 2019, @03:29AM (#896373) Homepage Journal

                That you ask has outed you as a witch. Gentlemen, bring the firewood.

                --
                My rights don't end where your fear begins.
              • (Score: 1, Informative) by Anonymous Coward on Friday September 20 2019, @09:19AM

                by Anonymous Coward on Friday September 20 2019, @09:19AM (#896437)

                Why is this bad? I convert tabs to spaces because of inconsistent treatment of tabs amongst programs while spaces plus a monospaced font are.

                Tabs are for indentation. Space is for *formatting*. So, if you have 1 indent, you have 1 tab. If you want more because it's line continuation or something, then you add spaces. That's why it's nice to have editors with visible whitespace. QtCreator is a nice editor for that. VS Code is nice for that too.

                Why? If you think, you wouldn't bother to ask. But since you didn't ... this allows one to adjust *preferred* whitespace to every developer simply by changing the tab width setting in any sensible editor. If some dev wants 5 spaces for tab, that's fine. If they want 2, that's fine. IF they like 8 or whatever, it all works. And it works without changing any line of code and fucking up history.

            • (Score: 2) by NotSanguine on Friday September 20 2019, @07:06AM

              by NotSanguine (285) <{NotSanguine} {at} {SoylentNews.Org}> on Friday September 20 2019, @07:06AM (#896421) Homepage Journal

              There will be no ed/vi/nano in my house. Emacs is the only OS I use. Now if I only had a text editor...

              --
              No, no, you're not thinking; you're just being logical. --Niels Bohr
  • (Score: 2) by gringer on Friday September 20 2019, @03:04AM

    by gringer (962) on Friday September 20 2019, @03:04AM (#896361)

    R is terrible for text processing. Extracting matches from regular expressions involves compiling the expressions and parsing a list. Some of the newest tidyverse packages are greatly improving the syntax, but the speed for text manipulation still remains an issue.

    --
    Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]