Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.

Sections

SoylentNews

Why Do Regexes Use `$` and `^` as Line Anchors?

posted by janrinok on Wednesday March 27 2024, @08:12PM

from the I-didn't-know-that-... dept.

owl writes:

https://buttondown.email/hillelwayne/archive/why-do-regexes-use-and-as-line-anchors/

Last week I fell into a bit of a rabbit hole: why do regular expressions use $ and ^ as line anchors?1
This talk brings up that they first appeared in Ken Thompson's port of the QED text editor. In his manual he writes: b) "^" is a regular expression which matches character at the beginning of a line.
c) "$" is a regular expression which matches character before the character (usually at the end of a line)
QED was the precursor to ed, which was instrumental in popularizing regexes, so a lot of its design choices stuck.
Okay, but then why did Ken Thompson choose those characters?

Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.

Why Do Regexes Use `$` and `^` as Line Anchors? | Log In/Create an Account | Top | 63 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:Regex is a Rabbit Hole(Score: 3, Informative) by Anonymous Coward on Thursday March 28 2024, @03:22PM

by Anonymous Coward on Thursday March 28 2024, @03:22PM (#1350698)

you can parse only regular languages with them.
By most definitions, you cannot even do that. Regular expression can only tokenize streams which is their use in many grammars (regular expressions do not have memory and cannot match opening/closing parentheses,

In formal language theory, "parse" simply means "for a language (set of strings) L and a string w, determine whether or not w is a member the set L". A regular language is simply one that can be parsed by a deterministic finite automaton (DFA), which is exactly the set of languages that can be described by regular expressions with the alternation, concatenation and kleene closure operators.

Languages with balanced parentheses are trivially shown to be not regular if the parentheses can be nested to arbitrary depth. So the fact that regular expressions cannot match arbitrary nesting depths of open/close parentheses does not contradict GP's point, as such a language is not a regular language. On the other hand, balanced parentheses with a finite maximum nesting depth is regular (basically, the states of a DFA can be used to count things but only to a finite bound).

Nevertheless, if you're talking about real-world regex implementations on computers then the GP's statement is not true, because real-world regex implementations have different operators and are usually much more computationally powerful than a DFA. For example, perl-compatible regexes have recursive match operators and absolutely can match balanced parentheses:

pcregrep '^([(](?1)*[)])*$' <<'EOF' () (()) ((()())())() (((())())() EOF

prints only lines with balanced parentheses:
() (()) ((()())())()

.NET regexes have an explicit "balancing group" operator which can be used for this sort of thing. It is also apparently possible with Java regexes [drregex.com] which have neither recursion nor balancing groups, but do have lookahead and forward reference operators.

Parent

Starting Score:	0		points
Moderation		+3
Interesting=1, Informative=2, Total=3
Extra 'Informative' Modifier		0

Total Score:		3

Moderator Help

Your lucky number is 3552664958674928. Watch for it everywhere.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Why Do Regexes Use `$` and `^` as Line Anchors?

Re:Regex is a Rabbit Hole(Score: 3, Informative) by Anonymous Coward on Thursday March 28 2024, @03:22PM