Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by janrinok on Thursday September 19 2019, @03:09PM   Printer-friendly
from the perl-one-liners dept.

Back in May, writer Jun Wu told in her blog how Perl excels at text manipulation. She often uses it to tidy data sets, a necessity as data is often collected with variations and cleaning it up before use is a necessity. She goes through many one-liners which help make that easy.

Having old reliables is my key to success. Ever since I learned Perl during the dot com bubble, I knew that I was forever beholden to its powers to transform.

You heard me. Freedom is the word here with Perl.

When I'm coding freely at home on my fun data science project, I rely on it to clean up my data.

In the real world, data is often collected with loads of variations. Unless you are using someone's "clean" dataset, you better learn to clean that data real fast.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by The Mighty Buzzard on Friday September 20 2019, @01:35AM (1 child)

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Friday September 20 2019, @01:35AM (#896337) Homepage Journal

    Depending on the use case, it can be, sure. If you're trying to serve up thousands of pages a second off that dataset or some such, I'd go with something a little more close to the metal so you can use an optimized function that does only what you need and does it in as few cycles as possible.

    --
    My rights don't end where your fear begins.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 0) by Anonymous Coward on Friday September 20 2019, @08:03PM

    by Anonymous Coward on Friday September 20 2019, @08:03PM (#896634)

    We're using Vertica (a column-oriented SQL database) for the high performance queries. But for loading the data into Vertica, we use Perl.