PHP Pattern Matching Guide
There are many functions for pattern matching. Let’s start with the simplest function.
<?php int preg_match(string $pattern, string $search); ?>
You can see that preg_match() returns 1 if found, and 0 if not found.
But that’s not true or false!
Yes, it is. 1 in a weakly typed language is true. 0 is false. You can only get false if it is one of these values: 0
, ""
, "0"
, false
, "false"
, null
.
Basic Match
If you use /cat/
as the pattern, it will match any cat found. Matches: my cat was here
, cats are inside the house
. Notice that it also matches ‘cats’ because ‘cat’ is there.
Try using /.at/
as the pattern. It matches hat
, pat
, sat
, but also #at
. To limit it between characters, use /[a-z]at/
. The .
is a wildcard. It matches anything except a newline (\n
).
Slashes or Delimiters?
You may have noticed the slashes. What are they? They’re delimiters!
Other delimiters include: !...!
and {...}
.
Character Classes
You’ve only seen one example of a character class. “Where?” you may ask. Well, it’s this: []
Special characters include: -
used to specify ranges, ^
“not” if it’s the first character. The carret symbol can be found at shift + 6.
Do /[a-z]/
to specify a character between a to z. To specify a character in lowercase or uppercase, do /[a-zA-Z]/
. To specify a number, do [0-9]
.
First or Last Character
Sometimes, you want to match the first or last character.
^
- Match first character.
$
- Match last character.
Therefore, /^[a-zA-Z]$/
matches a single character from a to z and A to Z.
Subpatterns
Subpatterns (...)
can specify a group easier.
/https?[1-2]/
matches http1
, http2
, https1
, and https2
.
The question mark means the character is optional.
/(https)?[1-2]/
matches https1
, https2
, 1
, and 2
.
The question mark after the subpattern means the subpattern is optional. This may come in useful.
More than one or zero
Use +
for more than zero. Use *
for more than or equal to zero.
Escaping Characters
To escape characters, use the \
(backslash). Use \\
to match a backslash.
Counted Expressions
Counted expressions look like this: {num1, num2}
. Be warned, though. If you are using the {...}
delimiters, then you must escape the characters with a backslash \{...\}
. To get a {
and }
matched with the {...}
delimiters, use \\\{
and \\\}
.
POSIX-styled character classes
Use [[:alpha:]]
to match an alphabetic character. And there’s more…
Capturing
Capturing to \1, \2, $1, and $2 is easy with subpatterns.
Notice: Only for the function preg_replace()
Examples
Email Match
<?php $result = preg_match("/[a-zA-Z0-9\-]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-\.]/", $content); ?>
Notice that we escaped the -
and .
. These two characters have special meanings.
I will explain the meaning.
[a-zA-Z0-9\-]+@
matches the characters before the @
. The \-
part matches a dash. Notice that we escape it thoroughly.
Before the dots, the part [a-zA-Z0-9\-]+\.
matches the part before the .
.
[a-zA-Z0-9\-\.]
matches the rest of the domain, with more dots if needed (subdomains
).
URL Match
<?php
$result = preg_match("{https?://[a-zA-Z0-9\.%#@?=\-]}i", $content, $array);
foreach ($array as $val) {
echo $val."<br>";
}
?>
This example is more complicated. We match an URL with valid URL characters. Then we get all the matched results and put them in $array. Then we use a foreach loop to loop through the array.
Functions
preg_match($pattern, $test, $array)
- Returns 1 if $pattern
matches $test
. Use $array
for the part that matches. Use count($array)
or sizeof($array)
to get the amount of $array
.
preg_split($pattern, $string)
- Turns a string into an array by regular expressions.
preg_match_all($pattern, $test, $array)
- returns how many times the pattern was matched.
preg_replace($pattern, $replacement, $search)
- searches for a pattern that matches $pattern
and replaces it with $replacement
in $search
. Returns the new string.
There is more, but I think I’ll stop now. Thanks for reading about regular expressions!