Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Thursday September 19 2019, @03:09PM   Printer-friendly
from the perl-one-liners dept.

Back in May, writer Jun Wu told in her blog how Perl excels at text manipulation. She often uses it to tidy data sets, a necessity as data is often collected with variations and cleaning it up before use is a necessity. She goes through many one-liners which help make that easy.

Having old reliables is my key to success. Ever since I learned Perl during the dot com bubble, I knew that I was forever beholden to its powers to transform.

You heard me. Freedom is the word here with Perl.

When I'm coding freely at home on my fun data science project, I rely on it to clean up my data.

In the real world, data is often collected with loads of variations. Unless you are using someone's "clean" dataset, you better learn to clean that data real fast.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by JoeMerchant on Thursday September 19 2019, @03:27PM (16 children)

    by JoeMerchant (3937) on Thursday September 19 2019, @03:27PM (#896115)

    I live mostly inside my C++ bubble, further subset into the Qt API. In here, QString is pretty damn good at wrangling string issues, and when it's not enough you can always bail out to RegEx (sounds like: retch, for a reason I think.)

    So much is just whatever you are familiar with, Boost, stdlib, whatever... if you know how to use them they've got most of the tools you need pre-coded, and if you find yourself doing the same thing over and over that takes more than 2 lines to accomplish, that sounds like time for a personal library extension...

    --
    🌻🌻 [google.com]
    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 3, Insightful) by The Mighty Buzzard on Thursday September 19 2019, @03:37PM (5 children)

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Thursday September 19 2019, @03:37PM (#896121) Homepage Journal

    Kind of the point. Using perl you can do a whole hell of a lot in one line. I've used a lot of languages over the years and still pick new ones up for fun every year or two and I've never found anything even close to as versatile and efficient with text as perl. Python usually takes a minimum of three times as many lines to accomplish what perl can do legibly in one, five to ten is more common.

    --
    My rights don't end where your fear begins.
    • (Score: 2) by JoeMerchant on Thursday September 19 2019, @03:49PM (4 children)

      by JoeMerchant (3937) on Thursday September 19 2019, @03:49PM (#896126)

      Yeah, one-liners are good - I really appreciate having an easy GUI that I can just throw open a scrolling text box in, append text to it all day long with single line commands, HTML format that text if I feel like it, maybe toss on a few checkboxes to toggle boolean control variables (like command line switches, but changeable at runtime...)

      Again, it's all in what you're used to. Today, I'm appreciating the verbose log files that make it relatively easy to spot what went weird when the testers come up with their 1/10,000 behaviors.

      --
      🌻🌻 [google.com]
      • (Score: 3, Interesting) by The Mighty Buzzard on Thursday September 19 2019, @03:57PM (3 children)

        by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Thursday September 19 2019, @03:57PM (#896130) Homepage Journal

        Honestly, I mostly use grep, awk, and sed for most one-liner type stuff. Perl is overkill for the extremely simple stuff. I mostly use it when I need something at least slightly more complex. The ability to get way more work done per readable line is just as useful in a script as on a command line though.

        --
        My rights don't end where your fear begins.
        • (Score: 0) by Anonymous Coward on Friday September 20 2019, @05:53AM (2 children)

          by Anonymous Coward on Friday September 20 2019, @05:53AM (#896404)

          I find grep and sed incredibly useful and intuitive, while never quite grokking awk. Don't know why.

          • (Score: 0) by Anonymous Coward on Friday September 20 2019, @08:57AM (1 child)

            by Anonymous Coward on Friday September 20 2019, @08:57AM (#896434)

            I find grep and sed incredibly useful and intuitive, while never quite grokking awk. Don't know why.

            Because awk is for parsing of stuff, especially column oriented documents. If you want 5th column of something, for example. But if you don't deal with column data, then you probably would never need awk.

            • (Score: 0) by Anonymous Coward on Friday September 20 2019, @11:00AM

              by Anonymous Coward on Friday September 20 2019, @11:00AM (#896452)

              Awk is really good at stuff you would normally have to pipe sed and grep for, you can use one simple awk statement. It also has some formatting capabilities so I like to use it when writing shell functions.

  • (Score: 0) by Anonymous Coward on Thursday September 19 2019, @03:48PM (4 children)

    by Anonymous Coward on Thursday September 19 2019, @03:48PM (#896125)

    Your bubble will burst when touched by Unicode multilanguage data mixed with funny math/geometry/engineering symbols and true emoji.

    • (Score: 2) by The Mighty Buzzard on Thursday September 19 2019, @03:52PM (2 children)

      by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Thursday September 19 2019, @03:52PM (#896128) Homepage Journal

      Perl's kind of shitty at that too unless you know the few simple tricks to make it not shitty at it.

      --
      My rights don't end where your fear begins.
      • (Score: 0) by Anonymous Coward on Friday September 20 2019, @03:20AM (1 child)

        by Anonymous Coward on Friday September 20 2019, @03:20AM (#896367)

        Really? Perl was the first language to manage that properly for me. Yeah, sometimes you need to be explicit in unusual ways, but at least you _can_ without jumping through hoops. I think this gets thrown in the "tricks" category just because it so rarely comes up that you're probably going to need to look it up when it does, cuz otherwise you just pick an encoding and forget about it.

    • (Score: 2) by JoeMerchant on Thursday September 19 2019, @08:55PM

      by JoeMerchant (3937) on Thursday September 19 2019, @08:55PM (#896252)

      Funny thing, QString handles Unicode, UTF-8, etc. conversions pretty much seamlessly, as do the other modern string classes. Thank God for that, because the last thing I want to fool with is conversion between Unicode and UTF-8.

      --
      🌻🌻 [google.com]
  • (Score: 3, Insightful) by legont on Thursday September 19 2019, @07:09PM (4 children)

    by legont (4179) on Thursday September 19 2019, @07:09PM (#896214)

    that sounds like time for a personal library extension...

    Yeah, but once one finishes building it, one discovers that he rebuild Perl.

    --
    "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
    • (Score: 2) by JoeMerchant on Thursday September 19 2019, @08:59PM (3 children)

      by JoeMerchant (3937) on Thursday September 19 2019, @08:59PM (#896254)

      Yeah, but once one finishes building it, one discovers that he rebuild Perl.

      If you're that heavy into what Perl does, by all means, use it. For me, it's a bigger PITA to "shell out" to get access to Perl than it is to recode in C++, on the rare occasions it is necessary. If I really loved Perl so much but still needed to be in C++, I'd make a dedicated wrapper for Perl and get full access that way.

      BTW: anyone considering PyQt for anything bigger than a toy project, my recommendation is: don't. But, then, that's pretty much my recommendation for Python all over - sure, there are some pretty impressive "large" things out there mostly based in Python - like trac, which I have happily used for over 10 years now, but... for the most part, unless the coding team is super disciplined, Python degenerates into a ball of snakes much faster than any of the C derivatives I have ever worked with.

      --
      🌻🌻 [google.com]
      • (Score: 2) by legont on Friday September 20 2019, @01:01AM (2 children)

        by legont (4179) on Friday September 20 2019, @01:01AM (#896323)

        My main choice for a long time was C (plain, without ++). For quick and dirty things I'd use AWK and I am talking here not about one liners, but full blown software of a few hundreds or even thousands lines of code. There was even an AWK compiler that one guy wrote and was selling for $99 that did a very good job.

        At some point I discovered Perl, gave it a try, and it replaced AWK, even compiled version of it, for me. Time passing, I realized that I pretty much stopped using C except in rare special occasions and that Perl would cover everything for me.

        Management would be forcing at different times Java, dotnet, Python and so on, but so far at the end Perl could not be replaced. There is another attempt going right now and this time they may succeed, but let's see...

        I appreciate your comment about Python, but if you were asked to replace a huge Perl project with something modern that fresh college kids would like, what would you recommend?

        --
        "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
        • (Score: 4, Insightful) by JoeMerchant on Friday September 20 2019, @01:11PM (1 child)

          by JoeMerchant (3937) on Friday September 20 2019, @01:11PM (#896478)

          if you were asked to replace a huge Perl project with something modern that fresh college kids would like, what would you recommend?

          Perl.

          I worked for almost a decade converting fresh college kids' code (Matlab, Python, and strangely: a fair bit of Fortran) to C++ so that their broken toys could be sold to real customers.

          --
          🌻🌻 [google.com]
          • (Score: 2) by legont on Friday September 20 2019, @05:54PM

            by legont (4179) on Friday September 20 2019, @05:54PM (#896583)

            Yes, my thoughts exactly.

            --
            "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.