|
I believe Thomas is closely following the team's past decisions:
http://www.nabble.com/forum/ViewPost.jtp?post=6867072&framed=y&skin=16154 Especially see the links in that post above (copied below) that define scope: http://www.nabble.com/Zend_Seach_Lucene-tf2315524s16154.html#a6490854 Today, the community discussed the possible use of iconv to handle simple UTF8 string manipulation tasks: http://framework.zend.com/wiki/display/ZFDEV/Wildfire+Jabber+Server I will start a new discussion thread for this, and the conclusion may affect the scope defined above. Cheers, Gavin Willie Alberty wrote: > I've been following this thread with great interest, and even > attempted to join in the conversation a couple of times. However, it > seems I am completely missing the point of the discussion... > > André, Ahmed, and myself would like to see Zend_Locale_UTF8 do more > Unicode-aware things than it does now. This would make it useful > outside of a Locale-only context. However, Gavin reminded us of the > prior direction from Zend that said Zend_Locale_UTF8 is to be > essentially a private helper class, to be used only for the explicit > needs of Zend_Locale. Fine. > > But in reading your last response, it seems as though you either don't > see the need for Zend_Locale_UTF8 or don't want it: > > On Oct 18, 2006, at 11:36 AM, Thomas Weidner wrote: > >> 1.) Zend_Locale_Format handles the input string, stripping >> seperators, changing fraction and negative sign. >> So our input string is normalized. This is already implemented. > > So you already have a comprehensive table of Unicode characters that > represent the decimal and thousands separators, as well as the > fraction and negative signs for every language supported by > Zend_Locale_Format? > >> 2.) Zend_Locale_Format calls Zend_Locale_UTF8 for converting the >> normalized value to local signs. >> So we have a normalized string with local signs. > > So you already have a comprehensive table of Unicode characters that > are numeric digits? How are you able to identify which characters are > digits, which are delimiters, and which are white space? If you > already know what characters are digits, why would you need > Zend_Locale_UTF8 at all? Just use same tables for conversion that > you're using for parsing. > >> 3.) Zend_Locale_Format localizes the returned string adding >> seerators, negative and fraction signs. >> This is also already implemented. > > Again, this implies in-depth knowledge of the character sets involved > for every language, including knowledge of which characters are > encoded in one-, two-, and three-bytes. Otherwise, you would not be > able to reliably insert a decimal separator at the correct location in > the byte stream. > >> 4.) In Zend_Measure_Numbers there will be added some functions as >> toArabic, fromArabic, toChinese, fromChinese and so on... >> So we could convert numbers locale aware to other number formats. >> A conversion for the roman, binary, octal, hexadecimal, decimal and >> some other number formats are already implemented there. > > Again, it sounds like all of the functionality you need is already > implemented elsewhere. > > Can you be more specific with the functions you *do* need > Zend_Locale_UTF8 to perform? After reading through this thread again, > and factoring in the Zend direction from Gavin, I think having this > class around is unnecessary. > > (André - If this turns out to be true, don't despair... I think there > is a great need for Unicode manipulation classes in PHP 5. In fact, I > have an explicit need in some of the work I'm planning for Zend_Pdf. > They might just need to live outside of Zend_Locale to survive. If the > adoption rate of PHP 5 by hosting providers is any indication, PHP 6 > is still several years away from being practical, which means Unicode > classes in the framework are unquestionably valuable.) > > -- > > Willie Alberty, Owner > Spenlen Media > [hidden email] > > http://www.spenlen.com/ > > -- Cheers, Gavin Which ZF List? ================= Everything, except the topics below: [hidden email] Authorization, Authentication, ACL, Access Control, Session Management [hidden email] Tests, Caching, Configuration, Environment, Logging [hidden email] All things related to databases [hidden email] Documentation, Translations, Wiki Manual / Tutorials [hidden email] Internationalization & Localization, Dates, Calendar, Currency, Measure [hidden email] Mail, MIME, PDF, Search, data formats (JSON, ...) [hidden email] MVC, Controller, Router, Views, Zend_Request* [hidden email] Community Servers/Services (shell account, PEAR channel, Jabber) [hidden email] Web Services & Servers (HTTP, SOAP, Feeds, XMLRPC, REST) [hidden email] How to un/subscribe: http://framework.zend.com/wiki/x/GgE |
|
In reply to this post by Thomas Weidner
Zend_Locale_UTF8 is purposely named wrong. We chose that name to
indicate it exists only to support the Zend_Locale classses, not because Zend_Locale_UTF8 should know anything at all about localization (or numbering systems). Reference: http://www.nabble.com/Zend_Seach_Lucene-tf2315524s16154.html#a6490854 Regarding converting numbers to/from: http://en.wikipedia.org/wiki/Roman_numerals http://en.wikipedia.org/wiki/Indian_numbering_system http://en.wikipedia.org/wiki/Japanese_numerals etc. I do not see any objections to including this functionality with Zend_Locale*, but only ideas on where to put the functions. If the logic is broken into functions that use the CDLR and "understand" locale/language/culture-specific numbering systems, and functions that do not use CDLR and do not understand specific numbering systems, then we only need worry about where to put each function. I'm not sure if this helps, but it makes sense to me, if we group related functions using the portions of the CDLR that relate to numbers and number systems into the same class. Per past discussions (and links to those discussions in recent emails), Zend_Locale_Utf8 should not contain logic to perform localization or internationalization. However, the exact same functions to perform normalization and formatting of numbers are still needed, and useful, but might instead by written in a locale-specific class. Thomas has provided a proposed / partially implemented hierarchy, including classes for containing these normalization and formatting functions. Cheers, Gavin |
|
In reply to this post by GavinZend
On Oct 18, 2006, at 1:35 PM, Gavin Vess wrote:
> I believe Thomas is closely following the team's past decisions: > > http://www.nabble.com/forum/ViewPost.jtp? > post=6867072&framed=y&skin=16154 > > Especially see the links in that post above (copied below) that > define scope: > > http://www.nabble.com/Zend_Seach_Lucene- > tf2315524s16154.html#a6490854 I did review that post... It was in response to one of my messages. ;-) But the defined scope is incredibly vague: The expected value and usefulness of Zend_Locale_Utf8 is not doubted, but we must be careful to avoid requirements creep. Previously, we agreed to allow UTF8 emulation functions (PHP functions written in pure PHP that support UTF8 strings) *only* for the functions absolutely required for Zend_Locale* classes to work. There is no mention of which functions will be required, which has lead to a great deal of speculation. André speculated that equivalents of intval() and floatval() might be needed. Ahmed and I thought that was a good idea. In the discussion that followed, we've only been arguing about what Zend_Locale_UTF8 should *not* be. I still haven't seen a concise description of what it *should* be. > Today, the community discussed the possible use of iconv to handle > simple UTF8 string manipulation tasks: > > http://framework.zend.com/wiki/display/ZFDEV/Wildfire+Jabber+Server If the needs of Zend_Locale are limited to simple string manipulations, iconv would be a much more efficient solution. I agree that full support for Unicode string handling within the framework should be provided by mbstring or PHP 6. But I believe that the framework could still benefit from some Unicode utility classes as there are a whole host of character attributes in the UCD that can be useful. -- Willie Alberty, Owner Spenlen Media [hidden email] http://www.spenlen.com/ |
|
In reply to this post by GavinZend
The following ZF components currently use iconv functions:
* Zend/Pdf/FileParser.php * Zend/Pdf/Resource/Font/Standard/*.php * Zend/Pdf/Resource/Font.php * Zend/Search/Lucene/Field.php * Zend/Service/Flickr.php * Zend/XmlRpc/Client.php http://www.php.net/manual/en/ref.iconv.php Questions ============== (1) Do the iconv functions actually work consistently in practice for PHP 5.1.4+ on all major platforms with the UTF8 charset? I have not yet found any reports indicating the iconv functions are unstable, inconsistent, or unusable with UTF8 strings. However, apparently Gentoo's default PHP 5.1.6 ebuild tries to build PHP without libxml and without iconv, unless the "xml" and "iconv" USE flags are enabled. (2) Would adding "iconv" to the official list of requirements for the ZF impose any practical burden on anyone? The libxml extension requires iconv. Many things require libxml. I have not found any distro shipping PHP 5.1.4+ that does not include support for the iconv functions. The windows binary downloaded via php.net was compiled with support for these functions. The configure script that ships with PHP 5.1.4+ includes "--with-iconv" by default. (3) When needed for working with UTF8 strings, are there any reasons to avoid using these iconv functions inside Zend_Locale and Zend_Search_Lucene classes? * iconv_strlen() * iconv_strpos() * iconv_strrpos() * iconv_substr() ||Cheers, Gavin P.S. $cleanedUTF8 = iconv("UTF-8", "UTF-8//IGNORE", $badUTF8); |
|
In reply to this post by Willie Alberty
Willie Alberty wrote:
> There is no mention of which functions will be required, which has > lead to a great deal of speculation. André speculated that equivalents > of intval() and floatval() might be needed. Ahmed and I thought that > was a good idea. > > In the discussion that followed, we've only been arguing about what > Zend_Locale_UTF8 should *not* be. I still haven't seen a concise > description of what it *should* be. If Thomas can not implement a function important to the "Locale" project, without a supporting, low-level function to perform some manipulation of UTF8 strings, then the function becomes a candidate for Zend_Locale_Utf8. The same situation exists for Zend_Search_Lucene and Alexander. We look to the Locale and Search teams to declare which UTF8 string manipulation/analysis functions they truly need. Currently, I encourage the entire community to review the iconv functions and situation and write their thoughts about using these functions in reply to our recent discussions on this list. The original intent was to purposely define Zend_Locale_Utf8 as the empty set of functions, and only add when absolutely required and justified, per the criteria mentioned previously. It seems that iconv already provides the three of the functions most probably needed (strlen, strpos, substr). However, there remains the possibility of a different component, a more general purpose Zend_Utf8 that might live in the Laboratory, if someone wishes to revive the original proposal. Cheers, Gavin |
|
In reply to this post by GavinZend
On Oct 18, 2006, at 3:37 PM, Gavin Vess wrote:
> The following ZF components currently use iconv functions: > > * Zend/Pdf/FileParser.php > * Zend/Pdf/Resource/Font/Standard/*.php > * Zend/Pdf/Resource/Font.php > ... Zend_Pdf primarily uses iconv() to translate between a string using an arbitrary character encoding (typically ISO-8859-1) to the Windows ANSI character set (CP-1252) when preparing to draw text on a page. It also uses iconv() when parsing TrueType font programs to extract strings such as font name, copyright, etc. > (1) Do the iconv functions actually work consistently in practice > for PHP 5.1.4+ on all major platforms with the UTF8 charset? There have been no reported problems with iconv() in conjunction with Zend_Pdf. In addition to the ISO-8859-1 and CP-1252 character sets, the font parsing classes use UTF-16BE (2-byte big-endian encoding) extensively. Future text layout classes will also require UTF-16BE support. > (2) Would adding "iconv" to the official list of requirements for > the ZF impose any practical burden on anyone? Not to myself or any of my clients. It should be noted that Zend_Pdf would be unusable with iconv. If it cannot be made a requirement for the framework as a whole, it must be listed as a requirement for Zend_Pdf. -- Willie Alberty, Owner Spenlen Media [hidden email] http://www.spenlen.com/ |
|
In reply to this post by GavinZend
On Oct 18, 2006, at 6:27 PM, Gavin Vess wrote:
> The original intent was to purposely define Zend_Locale_Utf8 as the > empty set of functions, and only add when absolutely required and > justified, per the criteria mentioned previously. It seems that > iconv already provides the three of the functions most probably > needed (strlen, strpos, substr). str_replace can also be easily implemented by using iconv_strpos and iconv_substr. I don't think that such a function should live in Zend_Locale_Utf8 though, as it would be character set-agnostic. It would probably be best to have it as a static utility method in Zend_Locale. > However, there remains the possibility of a different component, a > more general purpose Zend_Utf8 that might live in the Laboratory, > if someone wishes to revive the original proposal. I am working on some layout classes for Zend_Pdf that will handle things like wrapping long lines of text, text alignment, font size and style changes on a single line, etc. The implementation of these classes will require a Unicode-based backing store for the strings. In the current (half-done) implementation, I've created a Zend_Pdf_Text class along with several Unicode services helper classes. The Zend_Pdf_Text class stores an arbitrary amount of Unicode text, accepting source strings in any encoding. There is a subclass which allows attributes such as font, size, color, alignment, etc. to be placed on the string. The helper classes provide important character attributes from the UCD such as line break classes, text direction (left-to-right or right-to-left) classes, and bidi (bi-directional text) mirrored characters, which are required to properly lay out strings on the PDF page. After watching the discussion here and looking more closely at André's implementation of Zend_Locale_UTF8, I think these classes would be more useful at a higher level: Zend_String -------------------- General-purpose Unicode string storage class. Would contain most of the string manipulation functions André has already implemented in Zend_Locale_Utf8 and whatever else from my Zend_Pdf_Text that would be useful. Strings objects are constructed from ordinary PHP strings using any character encoding supported by iconv. Zend_String_Attributed -------------------- Extends Zend_String allowing attributes to be set on ranges of characters such as font size, color, alignment, etc. as well as any other user-defined attributes. Zend_String_Attributed objects would be used for advanced layout in Zend_Pdf. An attributed string class would also pave the way for RTF and Microsoft Word document generation. Zend_Range -------------------- Primitive range class, used by Zend_String_Attributed for setting character ranges. Has convenience functions to calculate unions, intersections, etc. Zend_Unicode -------------------- Static helper class which vends interesting information from the Unicode Character Database (UCD) such as character classes (i.e. - is the character numeric?), line break classes (for PDF layout), etc. This data comes from specialized Zend_Unicode_* objects which are loaded on-demand. While PHP 6 will provide native support for Unicode strings, that release is still pretty far off (there is a lot of work remaining: http://www.php.net/~scoates/unicode/render_func_data.php). In addition, I don't think there are any plans for an attributed string class or utility functions that return data from the UCD. More importantly, Unicode string support in PHP 6 will be enabled via an INI switch. It will be hard enough to get web hosting providers to offer PHP 6 at all. Fear of breaking their favorite control panel software or some esoteric extension they're using might mean getting native Unicode support will be next to impossible. For these reasons, and for those applications that might need to interact with Unicode strings even in PHP 6, but with the native Unicode support disabled, I feel strongly that such classes are useful. I'd be happy to help lead this effort as I have an immediate need for this capability (in Zend_Pdf). -- Willie Alberty, Owner Spenlen Media [hidden email] http://www.spenlen.com/ |
|
In reply to this post by GavinZend
Based on feedback from many, and current usage within the ZF, ZF
effectively already requires the iconv extension. No objections to use, significant problems or issues have been found that might prevent our use of PHP's iconv functions. Therefore, use of these iconv functions are encouraged, when needed. A small note has been appended to the draft coding standards: http://framework.zend.com/wiki/x/PQ Cheers, Gavin Gavin Vess wrote: > The following ZF components currently use iconv functions: > > * Zend/Pdf/FileParser.php > * Zend/Pdf/Resource/Font/Standard/*.php > * Zend/Pdf/Resource/Font.php > * Zend/Search/Lucene/Field.php > * Zend/Service/Flickr.php > * Zend/XmlRpc/Client.php > > http://www.php.net/manual/en/ref.iconv.php > > Questions > ============== > (1) Do the iconv functions actually work consistently in practice for > PHP 5.1.4+ on all major platforms with the UTF8 charset? > I have not yet found any reports indicating the iconv functions are > unstable, inconsistent, or unusable with UTF8 strings. > However, apparently Gentoo's default PHP 5.1.6 ebuild tries to build > PHP without libxml and without iconv, unless the "xml" and "iconv" USE > flags are enabled. > > (2) Would adding "iconv" to the official list of requirements for the > ZF impose any practical burden on anyone? > The libxml extension requires iconv. Many things require libxml. I > have not found any distro shipping PHP 5.1.4+ that does not include > support for the iconv functions. The windows binary downloaded via > php.net was compiled with support for these functions. The configure > script that ships with PHP 5.1.4+ includes "--with-iconv" by default. > > (3) When needed for working with UTF8 strings, are there any reasons > to avoid using these iconv functions inside Zend_Locale and > Zend_Search_Lucene classes? > * iconv_strlen() > * iconv_strpos() > * iconv_strrpos() > * iconv_substr() > > Cheers, > Gavin > > P.S. > $cleanedUTF8 = iconv("UTF-8", "UTF-8//IGNORE", $badUTF8); |
|
> Based on feedback from many, and current usage within the ZF, ZF
> effectively already requires the iconv extension. No objections to use, > significant problems or issues have been found that might prevent our use > of PHP's iconv functions. Therefore, use of these iconv functions are > encouraged, when needed. A small note has been appended to the draft > coding standards: I already included iconv for Zend_Locale_Format within the functions where I found problems. From Zend_Locale's view there's no need for UTF8 anymore. As discussed in another post it would be nice to have a function to convert between different number writing systems. In my opinion this is for now the only functionality which is valueable to be included. The question is if we want to include this into the framework or not. Do we want to include the complete UTF8 classes for this functionality or do we wait until PHP6 for this... Greetings Thomas |
|
I have no objections to a number conversion system, provided the source
code is placed into appropriate Zend_Locale* classes. Cheers, Gavin Thomas Weidner wrote: > As discussed in another post it would be nice to have a function to > convert > between different number writing systems. > In my opinion this is for now the only functionality which is > valueable to be included. > The question is if we want to include this into the framework or not. > > Do we want to include the complete UTF8 classes for this functionality > or do we wait > until PHP6 for this... > > Greetings > Thomas |
| Powered by Nabble | Edit this page |
