To get Unicode numbers out of a UTF-8 string, this can be used, for example:
<?php
print mb_encode_numericentity ('sāш日', array (0x0, 0xffff, 0, 0xffff), 'UTF-8');
?>
(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)
mb_encode_numericentity — Encode character to HTML numeric string reference
$string
,$map
,$encoding
= null
,$hex
= false
Converts
specified character codes in string string
from character code to HTML numeric character reference.
string
The string being encoded.
map
map
is array specifies code area to
convert.
encoding
Параметром
encoding
вказується кодування символів. Якщо він
пропущений або null
, використається внутрішнє кодування.
hex
Whether the returned entity reference should be in hexadecimal notation (otherwise it is in decimal notation).
The converted string.
Throws a ValueError if
map
is not a list of ints.
Версія | Опис |
---|---|
8.4.0 |
mb_encode_numericentity() now throws a
ValueError if map
is not a list of ints.
|
8.0.0 |
encoding тепер може бути null .
|
Приклад #1 map
example
<?php $convmap = array ( int start_code1, int end_code1, int offset1, int mask1, int start_code2, int end_code2, int offset2, int mask2, ........ int start_codeN, int end_codeN, int offsetN, int maskN ); // Specify Unicode value for start_codeN and end_codeN // Add offsetN to value and take bit-wise 'AND' with maskN, then // it converts value to numeric string reference. ?>
Приклад #2 mb_encode_numericentity() example
<?php
$str = "aAæÆあア𩸽";
/* Convert all UTF8 characters up to 4 bytes to HTML numeric character reference */
$convmap = [0, 0x1FFFFF, 0, 0x10FFFF];
var_dump(mb_encode_numericentity($str, $convmap, "utf8"));
/* Converts only 2-byte and 4-byte UTF8 characters to HTML numeric character reference */
$convmap = [
0x80, 0x7FF, 0, 0x10FFFF,
0x10000, 0x1FFFFF, 0, 0x10FFFF,
];
var_dump(mb_encode_numericentity($str, $convmap, "utf8"));
?>
Поданий вище приклад виведе:
string(46) "aAæÆあア鸽" string(28) "aAæÆあア鸽"
To get Unicode numbers out of a UTF-8 string, this can be used, for example:
<?php
print mb_encode_numericentity ('sāш日', array (0x0, 0xffff, 0, 0xffff), 'UTF-8');
?>
Here is a better explanation of convmap:
https://stackoverflow.com/questions/35854535/better-explanation-of-convmap-in-mb-encode-numericentity
We were experiencing difficulties with PHP/Sablotron on Solaris; placing HTML character references into the XSL transformation, when set to output UTF-8, converts them back into UTF8 encoded chars. This was then a problem for non unicode storage. Using a bit of code from http://homepage.mac.com/marko/ the following function converts the string back to character references:
function utf2html ($utf2html_string)
{
$f = 0xffff;
$convmap = array(
/* <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1//EN//HTML">
%HTMLlat1; */
160, 255, 0, $f,
/* <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols//EN//HTML">
%HTMLsymbol; */
402, 402, 0, $f, 913, 929, 0, $f, 931, 937, 0, $f,
945, 969, 0, $f, 977, 978, 0, $f, 982, 982, 0, $f,
8226, 8226, 0, $f, 8230, 8230, 0, $f, 8242, 8243, 0, $f,
8254, 8254, 0, $f, 8260, 8260, 0, $f, 8465, 8465, 0, $f,
8472, 8472, 0, $f, 8476, 8476, 0, $f, 8482, 8482, 0, $f,
8501, 8501, 0, $f, 8592, 8596, 0, $f, 8629, 8629, 0, $f,
8656, 8660, 0, $f, 8704, 8704, 0, $f, 8706, 8707, 0, $f,
8709, 8709, 0, $f, 8711, 8713, 0, $f, 8715, 8715, 0, $f,
8719, 8719, 0, $f, 8721, 8722, 0, $f, 8727, 8727, 0, $f,
8730, 8730, 0, $f, 8733, 8734, 0, $f, 8736, 8736, 0, $f,
8743, 8747, 0, $f, 8756, 8756, 0, $f, 8764, 8764, 0, $f,
8773, 8773, 0, $f, 8776, 8776, 0, $f, 8800, 8801, 0, $f,
8804, 8805, 0, $f, 8834, 8836, 0, $f, 8838, 8839, 0, $f,
8853, 8853, 0, $f, 8855, 8855, 0, $f, 8869, 8869, 0, $f,
8901, 8901, 0, $f, 8968, 8971, 0, $f, 9001, 9002, 0, $f,
9674, 9674, 0, $f, 9824, 9824, 0, $f, 9827, 9827, 0, $f,
9829, 9830, 0, $f,
/* <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special//EN//HTML">
%HTMLspecial; */
/* These ones are excluded to enable HTML: 34, 38, 60, 62 */
338, 339, 0, $f, 352, 353, 0, $f, 376, 376, 0, $f,
710, 710, 0, $f, 732, 732, 0, $f, 8194, 8195, 0, $f,
8201, 8201, 0, $f, 8204, 8207, 0, $f, 8211, 8212, 0, $f,
8216, 8218, 0, $f, 8218, 8218, 0, $f, 8220, 8222, 0, $f,
8224, 8225, 0, $f, 8240, 8240, 0, $f, 8249, 8250, 0, $f,
8364, 8364, 0, $f);
return mb_encode_numericentity($utf2html_string, $convmap, "UTF-8");
}
To improve handling of EURO-Symbols in dan at boxuk dot com's function add the following line to $convmap:
128,128,0, $f,