超搞笑的明星脸 Now the right panel is slideshowing my gallery
Oct 21
2005

Here is the UTF-8 version:

Note:

If the string is UTF-8 encoded, 1 Chinese Character is with length 3 if using PHP strlen().
If the string is GB encoded, 1 Chinese Character is with length 2 if using PHP strlen().

For example:

$str = “Hello你好么”;
print utf8_strlen2($str); // print 8
utf8_substr2($str, 0, 6); // print “Hello你”


/***********Start of Code ***************/

function utf8_substr2($str,$start) {


/*
UTF-8 version of substr(), for people who can’t use mb_substr() like me.
Length is not the count of Bytes, but the count of UTF-8 Characters

Author: Windix Feng
Bug report to: windix(AT)263.net, http://www.douzi.org/blog

- History -
1.0 2004-02-01 Initial Version
2.0 2004-02-01 Use PREG instead of STRCMP and cycles, SPEED UP!
*/

preg_match_all(”/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]|[\xe1-\xef][\x80-\xbf][\x80-\xbf]|\xf0[\x90-\xbf][\x80-\xbf][\x80-\xbf]|[\xf1-\xf7][\x80-\xbf][\x80-\xbf][\x80-\xbf]/”, $str, $ar);

if(func_num_args() >= 3) {
$end = func_get_arg(2);
return join(”",array_slice($ar[0],$start,$end));
} else {
return join(”",array_slice($ar[0],$start));
}
}

function utf8_strlen2($str) {
// 1 chinese char length is 1, so is 1 english char
preg_match_all(”/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]|[\xe1-\xef][\x80-\xbf][\x80-\xbf]|\xf0[\x90-\xbf][\x80-\xbf][\x80-\xbf]|[\xf1-\xf7][\x80-\xbf][\x80-\xbf][\x80-\xbf]/”, $str, $ar);
return count($ar[0]);
}

/************ End of Code ****************/

Following is the GB version:

c_substr count the string length by byte (i.e. 1 Chinese character is 2 bytes) while m_substr counts the string length by number of characters (i.e. 1 Chinese character is with the length 1).


/***********Start of Code ***************/

//by 徐祖宁( http://blog.i5un.com/item/39)

function c_substr($str,$start=0) {
$ch = chr(127);
$p = array("/[\x81-\xfe]([\x81-\xfe]|[\x40-\xfe])/","/[\x01-\x77]/");
$r = array("","");
if(func_num_args() > 2)
$end = func_get_arg(2);
else
$end = strlen($str);
if($start < 0)
$start += $end;

if($start > 0) {
$s = substr($str,0,$start);
if($s[strlen($s)-1] > $ch) {
$s = preg_replace($p,$r,$s);
$start += strlen($s);
}
}
$s = substr($str,$start,$end);
$end = strlen($s);
if($s[$end-1] > $ch) {
$s = preg_replace($p,$r,$s);
$end += strlen($s);
}
return substr($str,$start,$end);
}

function m_substr($str,$start) {
preg_match_all(”/[\x80-\xff]?./”,$str,$ar);
if(func_num_args() >= 3) {
$end = func_get_arg(2);
return join(”",array_slice($ar[0],$start,$end));
}else
return join(”",array_slice($ar[0],$start));
}

/************ End of Code ****************/

Leave a Reply