Unicode vers UTF-8 hexadecimal ?

par **Dharius** » 06 juil. 2007, 09:32

SAlut, Oui ca doit êter quelque-chose comme ça, mais ca me retourne pas ce que je veux :

- 'echo utf8_chr(4e00);' me retourne '' !
- 'echo utf8_decode_ncr(4e00);' me retourne '4e00'
- 'echo utf8_decode_ncr_callback(4e00);' me retourne rien...

Peut etre que j'ai inversé... je m'y connais pas trop.
EN fait, je crois que c'est l'inverse des ces fonctions qu'il me faut ?!

Je veux en fait via un algorithme ou une fonction obtenir ce qu'il y a à gauche grâce à ce qu'il y a à droite (il semble y avoir de fortes similitures) :

4FFA -> E4 BF BA (ou %E4%BF%BA )
4FFE -> E4 BF BE
500C -> E5 80 8C
500D -> E5 80 8D
500F -> E5 80 8F
5012 -> E5 80 92
...

je pense que E4 par exemple d'après ce que vous avez proposé serait plutôt 0xE4 ?!

Merci

par **Hubert Roksor** » 06 juil. 2007, 01:05

Tu peux réutiliser ces fonctions tirées de phpBB 3.0

/**
* Converts an NCR to a UTF-8 char
*
* @param	int		$cp	UNICODE code point
* @return	string		UTF-8 char
*/
function utf8_chr($cp)
{
	if ($cp > 0xFFFF)
	{
		return chr(0xF0 | ($cp >> 18)) . chr(0x80 | (($cp >> 12) & 0x3F)) . chr(0x80 | (($cp >> 6) & 0x3F)) . chr(0x80 | ($cp & 0x3F));
	}
	else if ($cp > 0x7FF)
	{
		return chr(0xE0 | ($cp >> 12)) . chr(0x80 | (($cp >> 6) & 0x3F)) . chr(0x80 | ($cp & 0x3F));
	}
	else if ($cp > 0x7F)
	{
		return chr(0xC0 | ($cp >> 6)) . chr(0x80 | ($cp & 0x3F));
	}
	else
	{
		return chr($cp);
	}
}

/**
* Convert Numeric Character References to UTF-8 chars
*
* Notes:
*	- we do not convert NCRs recursively, if you pass & it will return &
*	- we DO NOT check for the existence of the Unicode characters, therefore an entity may be converted to an inexistent codepoint
*
* @param	string	$text		String to convert, encoded in UTF-8 (no normal form required)
* @return	string				UTF-8 string where NCRs have been replaced with the actual chars
*/
function utf8_decode_ncr($text)
{
	return preg_replace_callback('/&#([0-9]{1,6}|x[0-9A-F]{1,5});/i', 'utf8_decode_ncr_callback', $text);
}

/**
* Callback used in decode_ncr()
*
* Takes a NCR (in decimal or hexadecimal) and returns a UTF-8 char. Attention, $m is an array.
* It will ignore most of invalid NCRs, but not all!
*
* @param	array	$m			0-based numerically indexed array passed by preg_replace_callback()
* @return	string				UTF-8 char
*/
function utf8_decode_ncr_callback($m)
{
	$cp = (strncasecmp($m[1], 'x', 1)) ? $m[1] : hexdec(substr($m[1], 1));

	return utf8_chr($cp);
}

utf8_decode_ncr() remplacera tous les { par le character correspondant en UTF-8.

par **Dharius** » 05 juil. 2007, 23:18

Bonjour,

J'ai un ptit problème de caractères pour la prog de ma BDD, si des personnes s'y connaissent, ça serait bienvenu !

1/ j'ai des caractères sous ce format : '&#19968';
2/ en php en lui passant 'dechex("19968")', j'obtiens alors le nombre unicode 4e00
3/ comment faire en php pour obtenir le code UTF-8 (hexa) correspondant qui est pour cet exemple 'E4 B8 80'

MERCI !!!!

Unicode vers UTF-8 hexadecimal ?

Répondre

Étendre la vue Revue du sujet : Unicode vers UTF-8 hexadecimal ?

Unicode vers UTF-8 hexadecimal ?