https://buttondown.email/hillelwayne/archive/why-do-regexes-use-and-as-line-anchors/
Last week I fell into a bit of a rabbit hole: why do regular expressions use $ and ^ as line anchors?1
This talk brings up that they first appeared in Ken Thompson's port of the QED text editor. In his manual he writes: b) "^" is a regular expression which matches character at the beginning of a line.
c) "$" is a regular expression which matches character before the character (usually at the end of a line)
QED was the precursor to ed, which was instrumental in popularizing regexes, so a lot of its design choices stuck.
Okay, but then why did Ken Thompson choose those characters?
(Score: 2) by ChrisMaple on Thursday March 28 2024, @03:32PM (2 children)
I use regexes occasionally, and they're a powerful time saver. However, long, complicated regexes are difficult to debug end seldom worth the effort. Reading someone else's regexes is even more difficult; like APL, it's a write-only language.
(Score: 2) by VLM on Thursday March 28 2024, @04:32PM
At some point it turns into "lets scrap it and replace with a simple parser"
It would be interesting to see a compiler that uses regex instead of a parser design.
You can regex a simple machine code assembler, but much more than the simplest and forget regex design it's parser time. Its pretty easy to make an assembler that uses regex to assemble a "nop" but "movb r0 (r1)+" (Its macro-11) would take a thundering lot of regex. That assembly language would, when thought about like C, be like write the contents of the first byte of variable/register r0 to memory using r1 as the pointer then increment the pointer for later use, essentially strcpy a single type char variable into a char array and get ready to copy the next char. C of course is just slightly fancied up PDP11 assembly, it's only on inferior processors that C looks like a separate language.
(Score: 2) by Rosco P. Coltrane on Thursday March 28 2024, @05:19PM
That's because programmers somehow forget all rules of readability when they write regexes for some reason.
My regexes are spread over several lines and indented. You can read them just fine even if they're really complicated. I always take the time to make my code readable for everybody else as a common courtesy, and regexes are an integral part of my code, so they get the same treatment for the same purpose.