Zend_Lucene + UTF8 search problem... Help!

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Zend_Lucene + UTF8 search problem... Help!

Maxim Savenko-2
Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:


require_once 'ZendInit.php';

require_once 'Zend/Search/Lucene.php';

require_once 'Zend/Search/Lucene/Document.php';


// Create index

$index = Zend_Search_Lucene::create('data/index');

$doc = new Zend_Search_Lucene_Document();

$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
текст; english text', 'utf-8'));

$index->addDocument($doc);

$index->commit();


// Open index and search:

$index = Zend_Search_Lucene::open('data/index');

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

Zend_Search_Lucene::setDefaultSearchField('samplefield');


// Query the index:

$queryStr = 'english';

$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');

$hits = $index->find($query);

foreach ($hits as $hit) {

/*@var $hit Zend_Search_Lucene*/

$doc = $hit->getDocument();

echo $doc->getField('samplefield')->value, PHP_EOL;

}


The 'samplefield' of the document contain string in too languages –
russian and english(see code). If we'll search 'english' it's all fine -
we successfully find the document, but if we'll try to find russian part
of field( set $queryStr to 'русский') then we don't find any document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko


Reply | Threaded
Open this post in threaded view
|

Re: Zend_Lucene + UTF8 search problem... Help!

Christopher Östlund
What's up with the spam?

On Thu, Jul 24, 2008 at 3:21 PM, Maxim Savenko <[hidden email]> wrote:
Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with Zend_Search_Lucene. Here is my short sample code:


require_once 'ZendInit.php';

require_once 'Zend/Search/Lucene.php';

require_once 'Zend/Search/Lucene/Document.php';


// Create index

$index = Zend_Search_Lucene::create('data/index');

$doc = new Zend_Search_Lucene_Document();

$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский текст; english text', 'utf-8'));

$index->addDocument($doc);

$index->commit();


// Open index and search:

$index = Zend_Search_Lucene::open('data/index');

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

Zend_Search_Lucene::setDefaultSearchField('samplefield');


// Query the index:

$queryStr = 'english';

$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');

$hits = $index->find($query);

foreach ($hits as $hit) {

/*@var $hit Zend_Search_Lucene*/

$doc = $hit->getDocument();

echo $doc->getField('samplefield')->value, PHP_EOL;

}


The 'samplefield' of the document contain string in too languages – russian and english(see code). If we'll search 'english' it's all fine - we successfully find the document, but if we'll try to find russian part of field( set $queryStr to 'русский') then we don't find any document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko



Reply | Threaded
Open this post in threaded view
|

Re: Zend_Lucene + UTF8 search problem... Help!

Tobias Gies
In reply to this post by Maxim Savenko-2
Maxim,

disregard the "Your message could not be delivered" spam. Your message was sent to this list 7 times now. The Mails with "Your message could not be delivered" are not being sent by Zend, they come from some british bloke who seems to be unable to properly configure his/her mailserver.

Best regards
Tobias

2008/7/24 Maxim Savenko <[hidden email]>:
Hi everybody,
[...]