博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
ES 相似度算法设置(续)
阅读量:7209 次
发布时间:2019-06-29

本文共 2092 字,大约阅读时间需要 6 分钟。

Tuning BM25

One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned:

k1
This parameter controls how quickly an increase in term frequency results in term-frequency saturation. The default value is 
1.2. Lower values result in quicker saturation, and higher values in slower saturation.
b
This parameter controls how much effect field-length normalization should have. A value of 
0.0disables normalization completely, and a value of 
1.0 normalizes fully. The default is 
0.75.

The practicalities of tuning BM25 are another matter. The default values for k1 and b should be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again.

The similarity algorithm can be set on a per-field basis. It’s just a matter of specifying the chosen algorithm in the field’s mapping:

PUT /my_index{  "mappings": {    "doc": {      "properties": {        "title": {          "type":       "string",          "similarity": "BM25"
},        "body": {          "type":       "string",          "similarity": "default"
}      }  }}

The title field uses BM25 similarity.

The body field uses the default similarity (see ).

Currently, it is not possible to change the similarity mapping for an existing field. You would need to reindex your data in order to do that.

Configuring BM25

Configuring a similarity is much like configuring an analyzer. Custom similarities can be specified when creating an index. For instance:

PUT /my_index{  "settings": {    "similarity": {      "my_bm25": {
"type": "BM25",        "b":    0
}    }  },  "mappings": {    "doc": {      "properties": {        "title": {          "type":       "string",          "similarity": "my_bm25"
},        "body": {          "type":       "string",          "similarity": "BM25"
}      }    }  }} 参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html
 
 
 
 
 
 
 
 
 
 
 
 
 
 
本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6472828.html,如需转载请自行联系原作者
 
 
 
你可能感兴趣的文章
dp-01背包问题 (升级)
查看>>
MySQL数据库唯一性设置(unique index)
查看>>
Windows性能计数器(命令行方式)
查看>>
Perl information,doc,module document and FAQ.
查看>>
sql 查询目标数据库中所有的表以其关键信息
查看>>
linux 下安装tomcat
查看>>
集成xadmin源码到项目的正式姿势
查看>>
自定义ViewPager,避免左右滑动时与水平滑动控件冲突
查看>>
javaScript-进阶篇(一)
查看>>
截取地址栏参数
查看>>
Redis介绍及Jedis基础操作
查看>>
<转> core Animation动画-2
查看>>
使用C创建php扩展
查看>>
CodeForces 151B Phone Numbers
查看>>
vector
查看>>
vue之实现日历----显示农历,滚动日历监听年月改变
查看>>
display:block;inline;inline-block大总结
查看>>
开博的第一天!
查看>>
hadoop-hdfs-fsimage与edits合并
查看>>
1.06 CCLayerColor 及 CCLayerGradient
查看>>