Indexing proposal

Benji York benji.york at canonical.com
Thu Sep 26 18:25:25 UTC 2013


We have a couple of outstanding bugs about indexing charm names better
(1205477 and 1220909).  After looking into Elastic Search's various
tokenizing options, the approach we should try is to index the
charm/bundle "name" into two fields: one will be "non-analized" (i.e.,
indexed in its entirety) the second will use an ngram tokenizer (min=2
max=20) but will not use the ngram search, because the max ngram size is
large enough to account for all search strings.  We will also use the
"dis_max" query type in order to score the two fields correctly.

Reference:
    http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenizer/
    http://elasticsearch-users.115913.n3.nabble.com/Which-is-the-best-right-use-of-NGrams-td4030176.html
    http://www.elasticsearch.org/guide/reference/query-dsl/dis-max-query/

-- 
Benji York



More information about the Juju-GUI mailing list