|
Hi all,
I did some test implementation working more simple with different character sets using iconv / mbstring and native code. The goal was too let users decide which of the php extension use to handle different character sets. The test implementation did a very simple, fast and expansible wrapper for iconv/mbstring/native. (https://github.com/marc-mabe/zf2/blob/string/library/Zend/Stdlib/StringUtils.php) I also did a simple benchmark that shows the mbstring adapter is faster as iconv even if wrapped with an adapter: (https://gist.github.com/2938899) $ php stringutils-bench.php native (▒): 0.0067460536956787 NativeAdapter (): 0.035496950149536 IconvNative (ß): 0.03082799911499 IconvAdapter (ß): 0.038977146148682 MbStringNative (ß): 0.0065720081329346 MbStringAdapter (ß): 0.010815858840942 Example 1: $stringAdapter = StringUtils::getAdapterByCharset("UTF-8"); $stringAdapter->strlen("ß"); // ... Example 2: (Fallback to ASCII if single byte charset) try { StringUtils::getAdapterByCharset($charset)->strlen($str); } catch (Exception $e) { if (StringUtils::isSingleByteCharset($charset)) { StringUtils::getAdapterByCharset("ASCII")->strlen($str); } } What do you think about this ? Greetings Marc |
|
Hi Marc,
I think you did a great job, but now it looks like full featured component, so not sure it belongs to Stdlib anymore. Whether or not this component will be accepted, I think there should be unified way to work with miltibyte strings throughout ZF2 Denis On 16.06.2012 2:02, Marc Bennewitz wrote: > Hi all, > > I did some test implementation working more simple with different > character sets using iconv / mbstring and native code. > > The goal was too let users decide which of the php extension use to > handle different character sets. > The test implementation did a very simple, fast and expansible wrapper > for iconv/mbstring/native. > (https://github.com/marc-mabe/zf2/blob/string/library/Zend/Stdlib/StringUtils.php) > > I also did a simple benchmark that shows the mbstring adapter is faster > as iconv even if wrapped with an adapter: > (https://gist.github.com/2938899) > > $ php stringutils-bench.php > native (▒): 0.0067460536956787 > NativeAdapter (): 0.035496950149536 > IconvNative (ß): 0.03082799911499 > IconvAdapter (ß): 0.038977146148682 > MbStringNative (ß): 0.0065720081329346 > MbStringAdapter (ß): 0.010815858840942 > > > Example 1: > $stringAdapter = StringUtils::getAdapterByCharset("UTF-8"); > $stringAdapter->strlen("ß"); > // ... > > Example 2: (Fallback to ASCII if single byte charset) > try { > StringUtils::getAdapterByCharset($charset)->strlen($str); > } catch (Exception $e) { > if (StringUtils::isSingleByteCharset($charset)) { > StringUtils::getAdapterByCharset("ASCII")->strlen($str); > } > } > > What do you think about this ? > > Greetings > Marc > |
|
This post has NOT been accepted by the mailing list yet.
In reply to this post by Marc Bennewitz (private)
On Jun 15, 2012, at 3:03 PM, Marc Bennewitz (private) [via Zend Framework Community] wrote:
> Hi all, > > I did some test implementation working more simple with different > character sets using iconv / mbstring and native code. I'm am just relaying this comment for a colleague who is not on the list. This is outside my expertise. Here is his comment: Performance is much less important for handling UTF-8 than knowing the limitations of mbstring especially. mbstring is faster because it only handles a small set of European languages plus some common Japanese characters. And why are you not comparing against the gold standard which is the intl extension. |
|
In reply to this post by DeNix
Currently I only did some tests to have a very fast and extensible API
whet it should name or were it should belongs to is debatable later ;) @jeremiah You post hasn't been accepted. I only noticed your commend by nabble. >I'm am just relaying this comment for a colleague who is not on the list. This is outside my expertise. Here is his comment: > >Performance is much less important for handling UTF-8 than knowing the limitations of mbstring especially. mbstring is faster because it only handles a small set of European languages plus some common >Japanese characters. And why are you not comparing against the gold standard which is the intl extension. Performance is very important because if you have to handle with different character sets you need to wrap each string function if you don't won't to hard code on one extension. The intl extension is for internationalization thats not the same as working with different character sets. The mbstring extension doesn't handle languages it handles character sets and it supports some different character sets that iconv. Greetings Marc On 17.06.2012 22:45, Denis Portnov wrote: > Hi Marc, > I think you did a great job, but now it looks like full featured > component, so not sure it belongs to Stdlib anymore. > Whether or not this component will be accepted, I think there should > be unified way to work with miltibyte strings throughout ZF2 > > Denis > > On 16.06.2012 2:02, Marc Bennewitz wrote: >> Hi all, >> >> I did some test implementation working more simple with different >> character sets using iconv / mbstring and native code. >> >> The goal was too let users decide which of the php extension use to >> handle different character sets. >> The test implementation did a very simple, fast and expansible wrapper >> for iconv/mbstring/native. >> (https://github.com/marc-mabe/zf2/blob/string/library/Zend/Stdlib/StringUtils.php) >> >> >> I also did a simple benchmark that shows the mbstring adapter is faster >> as iconv even if wrapped with an adapter: >> (https://gist.github.com/2938899) >> >> $ php stringutils-bench.php >> native (▒): 0.0067460536956787 >> NativeAdapter (): 0.035496950149536 >> IconvNative (ß): 0.03082799911499 >> IconvAdapter (ß): 0.038977146148682 >> MbStringNative (ß): 0.0065720081329346 >> MbStringAdapter (ß): 0.010815858840942 >> >> >> Example 1: >> $stringAdapter = StringUtils::getAdapterByCharset("UTF-8"); >> $stringAdapter->strlen("ß"); >> // ... >> >> Example 2: (Fallback to ASCII if single byte charset) >> try { >> StringUtils::getAdapterByCharset($charset)->strlen($str); >> } catch (Exception $e) { >> if (StringUtils::isSingleByteCharset($charset)) { >> StringUtils::getAdapterByCharset("ASCII")->strlen($str); >> } >> } >> >> What do you think about this ? >> >> Greetings >> Marc >> > > > |
| Powered by Nabble | Edit this page |
