Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Thursday September 19 2019, @03:09PM   Printer-friendly
from the perl-one-liners dept.

Back in May, writer Jun Wu told in her blog how Perl excels at text manipulation. She often uses it to tidy data sets, a necessity as data is often collected with variations and cleaning it up before use is a necessity. She goes through many one-liners which help make that easy.

Having old reliables is my key to success. Ever since I learned Perl during the dot com bubble, I knew that I was forever beholden to its powers to transform.

You heard me. Freedom is the word here with Perl.

When I'm coding freely at home on my fun data science project, I rely on it to clean up my data.

In the real world, data is often collected with loads of variations. Unless you are using someone's "clean" dataset, you better learn to clean that data real fast.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by Freeman on Thursday September 19 2019, @03:18PM (3 children)

    by Freeman (732) on Thursday September 19 2019, @03:18PM (#896110) Journal

    Open Refine can do quite a bit with regard to text manipulation as well. I would say Open Refine is also much easier to see what you've done and what you're doing. There's also cool features like, being able to save a set of procedures, so you can replicate the process on another data set.

    OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

    OpenRefine is available in English, Chinese, Spanish, French, Russian, Portuguese (Brazil), German, Japanese, Italian, Hungarian, Hebrew, Filipino, Cebuano, Tagalog

    http://openrefine.org/ [openrefine.org]

    --
    Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2, Interesting) by Anonymous Coward on Thursday September 19 2019, @04:45PM (2 children)

    by Anonymous Coward on Thursday September 19 2019, @04:45PM (#896145)

    When you do not know what you are doing, you should not be touching any datasets.
    When you do know, you have plenty of time-tested tools that you can use, without having to learn a Google-invented bastard language not fit for any other purpose.

    • (Score: 2) by Freeman on Thursday September 19 2019, @05:19PM (1 child)

      by Freeman (732) on Thursday September 19 2019, @05:19PM (#896171) Journal

      Or, just use the built-in tools and ignore their bastardized language? This tool isn't an online Google service. It seems like your major complaint is Google touched it at one point, so it must be tainted demon spawn?

      --
      Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
      • (Score: 1, Touché) by Anonymous Coward on Thursday September 19 2019, @09:56PM

        by Anonymous Coward on Thursday September 19 2019, @09:56PM (#896274)

        The reason to play point-and-click game every time anew, instead of run a long-ago written oneliner, is ???