I've been using ChatGPT to help me catch syntax and logic errors in my regex before I go live with them. There's a specific case I'm grappling with. In my original Perl regex, I had:
s#(<span [^>]*>)?Follow us: (</span>)?##gi
I tried to modify it to:
s{
<br>
(?:<span(?:s[^>]*)?>)?
Followsus:s+
(?:</span>)?
}
{}xgi
ChatGPT is insisting that the s between "Follow" and "us" is a mistake unless I add a + after it. I'm confused about this. Is ChatGPT correct in saying that my regex will fail without that +, or am I missing something?
3 Answers
It's worthwhile to test both versions directly using something like regex101.com. That way, you can see firsthand if the regex is catching everything you expect. Just keep in mind that s matches spaces, tabs, and new lines, whereas just a space only matches a literal space.
In short, the difference lies in how many whitespace characters you want to allow. The only reason to prefer s+ is if you expect multiple spaces or a mix of whitespace characters between 'Follow' and 'us'. ChatGPT may have a point in that case. But if you're sure there's always just one space, your version works too!
Honestly, it feels like ChatGPT might be on to something here. The s by itself would match just one whitespace character before the 's'. If you want to account for varying amounts of whitespace, adding a + makes it more flexible. Otherwise, you'll only match cases with a single space, which may not always be the case.
Yeah, exactly! Using s+ means it will capture one or more whitespace characters, which is usually what you want in text processing like this.

Good advice! I’d definitely suggest trying regex101. It's super helpful for troubleshooting regex issues.