Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.
I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.
This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.
<?php
function count_words($string) {
$string= str_replace("'", "'", $string);
$t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); $string= str_replace($t, " ", $string);
$string= trim(preg_replace("/\s+/", " ", $string));
$num= 0;
if (my_strlen($string)>0) {
$word_array= explode(" ", $string);
$num= count($word_array);
}
return $num;
}
function my_strlen($s) {
return mb_strlen($s, "UTF-8");
}
?>