8EB-F5F Zend_Lucene + UTF8 search problem... Help!

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

8EB-F5F Zend_Lucene + UTF8 search problem... Help!

Maxim Savenko
8EB-F5F

Hi everybody,

I have a problem with searching russian strings, utf8 encoded,  with
Zend_Search_Lucene. Here is my short sample code:

-----------------code---------------
<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Document.php';

// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
текст; english text', 'utf-8'));
$index->addDocument($doc);
$index->commit();

// Open index and search:
$index = Zend_Search_Lucene::open('data/index');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene::setDefaultSearchField('samplefield');

// Query the index:
$queryStr = 'english';
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
$hits = $index->find($query);
foreach ($hits as $hit) {
   /*@var $hit Zend_Search_Lucene*/
   $doc = $hit->getDocument();
   echo $doc->getField('samplefield')->value, PHP_EOL;
}
-----------------code---------------

The 'samplefield' of the document contain string in too languages �C
russian and english(see code). If we'll search 'english' it's all fine
- we successfully find the document, but if we'll try to find russian
part of field( set $queryStr to 'русский') then we don't find any
document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko
Reply | Threaded
Open this post in threaded view
|

RE: 8EB-F5F Zend_Lucene + UTF8 search problem... Help!

wllm

 > 8EB-F5F

I think this code is to get through one user's spam filter. I recommend
mailing the listee directly and not using the codes in mail to the list.
I'm sure these codes could be pretty confusing for those reading the
messages in the archive. :)

,Wil