PHP Recursive Regex Pattern

Posted on January 5th 2011 4:06pm Wednesday, by Blaine

Recently I had a client that needed their emails formatted with headers. Bender (my contact form class) did not easily support this. I needed the ability to parse a string and split it up into an array based on square brackets. And have the ability to do this at infinitely deep levels. I also wanted it to be very flexible and forgiving, since other people use bender and they may accidentally do something different.

In my regular expression, I only look for a very select few characters and the rest get thrown out. This allows the user to use characters that are not a-z, 0-9, or underscores to separate the values in the string. New lines will also work to separate the values. I’ve also added indentation in my example to make easier to read, this does not affect the parser’s ability to separate the values. If the user wanted they could put this on a single line and separate the values by commas or another character.

This is the example input:

$string = '
personal_information[
	first_name
	last_name
	email
	private_information[
		birth_day
		mothers_name
		fathers_name
	]
	previous_employers[
		company_1[
			company_1_name
			company_1_start_date
			company_1_position
		]
		company_2[
			company_2_name
			company_2_start_date
			company_2_position
		]
	]
]
how_did_you_find_us
comments
';

The output is a multidimensional array that is built by a recursive function and a recursive regular expression.

function parseInput ($string) {
	$array = array();
	// The pattern searches for string[*] recursively finding the appropriate closing and opening brackets
	// It does this by the use of ?R which makes it a recursive regular expression
	$pattern = "/([\*a-zA-Z0-9_]+)(?:\[((?:[^\[\]]+|(?R))*)\])?/";
	preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
	// $matches[0] is the value and children: value[children]
	// $matches[1] is the value
	// $matches[2] is the children if they exist
	$countMatches = count($matches[0]);
	// loop through the matches
	for ($i = 0; $i < $countMatches; $i++) {
		// Check if the value has children
		if ($matches[2][$i][0]) {
			$array[] = array(
				'header' => $matches[1][$i][0],
				'children' => parseInput($matches[2][$i][0],true)
			);
		} else {
			$array[] = array('name' => $matches[1][$i][0]);
		}
	}
	return $array;
}
// Parse and print the $string
print_r(parseInput($string));

Here is a working example which shows the input text and output array. And here is the source code of the example.

Leave a Reply