2008/11/02

PHP "Cleaning Up Whitespace"

Often when dealing with data coming from a user, or perhaps from a database, you can have extra spaces (or tabs, carriage returns, and so on) in your strings. This often causes problems later when you want to compare strings. It also can waste storage space if you save those extra useless characters.

If you want to just remove space from the beginning or end of a string, you can use the built-in function trim() to do that for you. However, often you want to completely clean up the data. You will want to remove leading/trailing spaces, collapse multiple spaces into a single one, and even replace all other types of whitespace with a regular space.

To accomplish this, you can use the regular expression functions of PHP, as shown text bellow :

$str = " This line  contains\tliberal \r\n use of   whitespace.\n\n";

// First remove the leading/trailing whitespace
$str = trim($str);

// Now remove any doubled-up whitespace
$str = preg_replace('/\s(?=\s)/', '', $str);

// Finally, replace any non-space whitespace, with a space
$str = preg_replace('/[\n\r\t]/', ' ', $str);

// Echo out: 'This line contains liberal use of whitespace.'
echo "
{$str}
";
?>

the text removes all whitespace step-by-step. First we use TRim() to clean the beginning and end of the string. Second, we use preg_replace() to remove duplicates. The \s stands in regex for "any whitespace." The (?=) notation is a forward lookup. It means to only match the previous character if the character within the parenthesis follows it. Therefore this regex means: "Any whitespace character that is followed by a whitespace character." We tell it to replace this with a blank string and therefore remove them. This removes all duplicates, leaving only single whitespace characters.

Finally, we use another regex[\n\r\t]to find any remaining newlines (\n), carriage returns (\r), or tabs (\t). We then replace these with a single space.



No comments: