本文共 2092 字,大约阅读时间需要 6 分钟。
One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned:
k1
1.2
. Lower values result in quicker saturation, and higher values in slower saturation. b
0.0
disables normalization completely, and a value of 1.0
normalizes fully. The default is 0.75
. The practicalities of tuning BM25 are another matter. The default values for k1
and b
should be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again.
The similarity algorithm can be set on a per-field basis. It’s just a matter of specifying the chosen algorithm in the field’s mapping:
PUT /my_index{ "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "BM25"
}, "body": { "type": "string", "similarity": "default"
} } }}
The | |
The |
Currently, it is not possible to change the similarity
mapping for an existing field. You would need to reindex your data in order to do that.
Configuring a similarity is much like configuring an analyzer. Custom similarities can be specified when creating an index. For instance:
PUT /my_index{ "settings": { "similarity": { "my_bm25": {
"type": "BM25", "b": 0
} } }, "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "my_bm25"
}, "body": { "type": "string", "similarity": "BM25"
} } } }} 参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html
本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6472828.html,如需转载请自行联系原作者