Lucene : Numeric value ignored in search query

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene : Numeric value ignored in search query

Jean-Marc Fontaine
Hello,

I use Lucene to search documents. The actual content of the documents is stored in the database. I index the documents ids and content in Lucene for search purposes.

When a document is removed from my database, I need to remove it from Lucene index. To do so, I need to find the Lucene document id which is different from my document id.

When I search for "id:2" for example, my query is considered insignificant by the query parser. I tried to add some prefix to avoid potential minimal length but only the prefix is search for.

Any idea anyone ? :)
Reply | Threaded
Open this post in threaded view
|

Re: Lucene : Numeric value ignored in search query

Jean-Marc Fontaine
Found the solution to this : you must specify an analyser containing "Num" in its name (eg. Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num).

Use this to do so :

            Zend_Search_Lucene_Analysis_Analyzer::setDefault(
                new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num()
            );

Reply | Threaded
Open this post in threaded view
|

RE: Lucene : Numeric value ignored in search query

Alexander Veremyev
In reply to this post by Jean-Marc Fontaine
Hi!

Indexed documents may have two type of id's:

1. Internal document id returned by $hit->id and used by
$index->getDocument(), $index->delete() and some other methods.

This id _may_ and _will_ be changed while index optimization (or
auto-optimization) and can't be used to refer indexed document.

This id also can't be used in search queries.


2. Some unique (or not unique) value added to document while indexing:
...
$doc->addField(Zend_Search_Lucene_Field::Keyword('DB_id', $dbId));
...

This field can be used for searching document:
$hits = $index->find('DB_id:2');

or (better) directly retrieving documents:
...
$docIDs = $index->TermDocs(new Zend_Search_Lucene_Index_Term('2',
'DB_id'));
foreach ($docIDs as $docId) {
    $index->delete($docId);
}


PS All these things are described in the documentation ;)

With best regards,
   Alexander Veremyev.

> -----Original Message-----
> From: Jean-Marc Fontaine [mailto:[hidden email]]
> Sent: Tuesday, September 23, 2008 5:25 PM
> To: [hidden email]
> Subject: [fw-formats] Lucene : Numeric value ignored in search query
>
>
> Hello,
>
> I use Lucene to search documents. The actual content of the documents
is
> stored in the database. I index the documents ids and content in
Lucene
> for
> search purposes.
>
> When a document is removed from my database, I need to remove it from
> Lucene
> index. To do so, I need to find the Lucene document id which is
different
> from my document id.
>
> When I search for "id:2" for example, my query is considered
insignificant
> by the query parser. I tried to add some prefix to avoid potential
minimal
> length but only the prefix is search for.
>
> Any idea anyone ? :)
> --
> View this message in context:
http://www.nabble.com/Lucene-%3A-Numeric-
> value-ignored-in-search-query-tp19627596p19627596.html
> Sent from the Zend MFS mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

RE: Lucene : Numeric value ignored in search query

Jean-Marc Fontaine
Hi Alexander,

thank you for your answer but I think you missed the point in my question. :)

I read the documentation and I know the difference between my DB ids and Lucene ids. BTW, you can name your search field "id" if you will. The only pitfall if you do so is that retrieving $document->id will return Lucene id and not your DB id. To get the DB id you must use the $document->getField('id') method.

As I said in my second message, my problem was coming from the default analyser which do not allow to index numeric values. Using another analyzer solved the problem.

Anyway, thank you for trying to help. ;)

Regards,

Jean-Marc