Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday March 27 2024, @08:12PM   Printer-friendly
from the I-didn't-know-that-... dept.

https://buttondown.email/hillelwayne/archive/why-do-regexes-use-and-as-line-anchors/

Last week I fell into a bit of a rabbit hole: why do regular expressions use $ and ^ as line anchors?1

This talk brings up that they first appeared in Ken Thompson's port of the QED text editor. In his manual he writes: b) "^" is a regular expression which matches character at the beginning of a line.

c) "$" is a regular expression which matches character before the character (usually at the end of a line)

QED was the precursor to ed, which was instrumental in popularizing regexes, so a lot of its design choices stuck.

Okay, but then why did Ken Thompson choose those characters?


Original Submission

 
This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by ChrisMaple on Thursday March 28 2024, @03:32PM (2 children)

    by ChrisMaple (6964) on Thursday March 28 2024, @03:32PM (#1350703)

    I use regexes occasionally, and they're a powerful time saver. However, long, complicated regexes are difficult to debug end seldom worth the effort. Reading someone else's regexes is even more difficult; like APL, it's a write-only language.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by VLM on Thursday March 28 2024, @04:32PM

    by VLM (445) on Thursday March 28 2024, @04:32PM (#1350712)

    long, complicated regexes are difficult to debug end seldom worth the effort

    At some point it turns into "lets scrap it and replace with a simple parser"

    It would be interesting to see a compiler that uses regex instead of a parser design.

    You can regex a simple machine code assembler, but much more than the simplest and forget regex design it's parser time. Its pretty easy to make an assembler that uses regex to assemble a "nop" but "movb r0 (r1)+" (Its macro-11) would take a thundering lot of regex. That assembly language would, when thought about like C, be like write the contents of the first byte of variable/register r0 to memory using r1 as the pointer then increment the pointer for later use, essentially strcpy a single type char variable into a char array and get ready to copy the next char. C of course is just slightly fancied up PDP11 assembly, it's only on inferior processors that C looks like a separate language.

  • (Score: 2) by Rosco P. Coltrane on Thursday March 28 2024, @05:19PM

    by Rosco P. Coltrane (4757) on Thursday March 28 2024, @05:19PM (#1350727)

    Reading someone else's regexes is even more difficult; like APL, it's a write-only language.

    That's because programmers somehow forget all rules of readability when they write regexes for some reason.

    My regexes are spread over several lines and indented. You can read them just fine even if they're really complicated. I always take the time to make my code readable for everybody else as a common courtesy, and regexes are an integral part of my code, so they get the same treatment for the same purpose.