Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday March 27 2024, @08:12PM   Printer-friendly
from the I-didn't-know-that-... dept.

https://buttondown.email/hillelwayne/archive/why-do-regexes-use-and-as-line-anchors/

Last week I fell into a bit of a rabbit hole: why do regular expressions use $ and ^ as line anchors?1

This talk brings up that they first appeared in Ken Thompson's port of the QED text editor. In his manual he writes: b) "^" is a regular expression which matches character at the beginning of a line.

c) "$" is a regular expression which matches character before the character (usually at the end of a line)

QED was the precursor to ed, which was instrumental in popularizing regexes, so a lot of its design choices stuck.

Okay, but then why did Ken Thompson choose those characters?


Original Submission

 
This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by VLM on Thursday March 28 2024, @04:32PM

    by VLM (445) on Thursday March 28 2024, @04:32PM (#1350712)

    long, complicated regexes are difficult to debug end seldom worth the effort

    At some point it turns into "lets scrap it and replace with a simple parser"

    It would be interesting to see a compiler that uses regex instead of a parser design.

    You can regex a simple machine code assembler, but much more than the simplest and forget regex design it's parser time. Its pretty easy to make an assembler that uses regex to assemble a "nop" but "movb r0 (r1)+" (Its macro-11) would take a thundering lot of regex. That assembly language would, when thought about like C, be like write the contents of the first byte of variable/register r0 to memory using r1 as the pointer then increment the pointer for later use, essentially strcpy a single type char variable into a char array and get ready to copy the next char. C of course is just slightly fancied up PDP11 assembly, it's only on inferior processors that C looks like a separate language.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2