I wanted to use the tokenizer functions to count source lines of code, including counting comments. Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations. The token_get_all() function makes this task easy by detecting all the comments properly. However, it does not tokenize newline characters. I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.
<?php
define('T_NEW_LINE', -1);
function token_get_all_nl($source)
{
$new_tokens = array();
// Get the tokens
$tokens = token_get_all($source);
// Split newlines into their own tokens
foreach ($tokens as $token)
{
$token_name = is_array($token) ? $token[0] : null;
$token_data = is_array($token) ? $token[1] : $token;
// Do not split encapsed strings or multiline comments
if ($token_name == T_CONSTANT_ENCAPSED_STRING || substr($token_data, 0, 2) == '/*')
{
$new_tokens[] = array($token_name, $token_data);
continue;
}
// Split the data up by newlines
$split_data = preg_split('#(\r\n|\n)#', $token_data, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
foreach ($split_data as $data)
{
if ($data == "\r\n" || $data == "\n")
{
// This is a new line token
$new_tokens[] = array(T_NEW_LINE, $data);
}
else
{
// Add the token under the original token name
$new_tokens[] = is_array($token) ? array($token_name, $data) : $data;
}
}
}
return $new_tokens;
}
function token_name_nl($token)
{
if ($token === T_NEW_LINE)
{
return 'T_NEW_LINE';
}
return token_name($token);
}
?>
Example usage:
<?php
$tokens = token_get_all_nl(file_get_contents('somecode.php'));
foreach ($tokens as $token)
{
if (is_array($token))
{
echo (token_name_nl($token[0]) . ': "' . $token[1] . '"<br />');
}
else
{
echo ('"' . $token . '"<br />');
}
}
?>
I'm sure you can figure out how to count the lines of code, and lines of comments with these functions. This was a huge improvement on my previous attempt at counting lines of code with regular expressions. I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.