您现在的位置是：首页 > 工具

当前栏目

Apache Solr vs Elasticsearch-feature

Apache vs elasticsearch Solr Feature

2023-09-27 14:26:36 时间

Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source)

Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, Wordpress, CouchBase

Only in non-SolrCloud. In SolrCloud, behaves identically to ES.

Not an issue because shards are replicated across nodes.
Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories
[DEPRECATED in 2.x] Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia

Schemaless mode or via dynamic fields.

Only backward-compatible changes.

Need to programmatically create queries if going beyond Lucene query syntax.

Percolation. Distributed percolation supported in 1.0

query_string, dis_max, match, multi_match etc

but awkward. Involves positively boosting the inverse set of negatively-boosted documents.

Installable from GitHub, maven, sonatype or elasticsearch.org

The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function.

Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this

If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.

it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.

Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime.

each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime.

can be done by creating a shard replicate on the desired node and then removing the shard from the source node

can move shards and replicas to any node in the cluster on demand
Indexing requests are synchronous with replication. A indexing request wont return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they wont start accepting and responding to requests until they are finished replicating the index. Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.
Im embedding my answer to this "Solr-vs-Elasticsearch" Quora question verbatim here:

1. Elasticsearch was born in the age of REST APIs. If you love REST APIs, youll probably feel more at home with ES from the get-go. I dont actually think its cleaner or easier to use, but just that it is more aligned with web 2.0 developers mindsets.

2. Elasticsearchs Query DSL syntax is really flexible and its pretty easy to write complex queries with it, though it does border on being verbose. Solr doesnt have an equivalent, last I checked. Having said that, Ive never found Solrs query syntax wanting, and Ive always been able to easily write a custom SearchComponent if needed (more on this later).

3. I find Elasticsearchs documentation to be pretty awful. It doesnt help that some examples in the documentation are written in YAML and others in JSON. I wrote a ES code parser once to auto-generate documentation from Elasticsearchs source and found a number of discrepancies between code and whats documented on the website, not to mention a number of undocumented/alternative ways to specify the same config key.

By contrast, Ive found Solr to be consistent and really well-documented. Ive found pretty much everything Ive wanted to know about querying and updating indices without having to dig into code much. Solrs schema.xml and solrconfig.xml are *extensively* documented with most if not all commonly used configurations.

4. Whilst what Rick says about ES being mostly ready to go out-of-box is true, I think that is also a possible problem with ES. Many users dont take the time to do the most simple config (e.g. type mapping) of ES because it just works in dev, and end up running into issues in production.

And once you do have to do config, then I personally prefer Solrs config system over ES. Long JSON config files can get overwhelming because of the JSONs lack of support for comments. Yes you can use YAML, but its annoying and confusing to go back and forth between YAML and JSON.

5. If your own app works/thinks in JSON, then without a doubt go for ES because ES thinks in JSON too. Solr merely supports it as an afterthought. ES has a number of nice JSON-related features such as parent-child and nested docs that makes it a very natural fit. Parent-child joins are awkward in Solr, and I dont think theres a Solr equivalent for ES Inner hits.

6. ES doesnt require ZooKeeper for its elastic features which is nice coz I personally find ZK unpleasant, but as a result, ES does have issues with split-brain scenarios though (google elasticsearch split-brain or see this: Elasticsearch Resiliency Status).

7. Overall from working with clients as a Solr/Elasticsearch consultant, Ive found that developer preferences tend to end up along language party lines: if youre a Java/c# developer, youll be pretty happy with Solr. If you live in Javascript or Ruby, youll probably love Elasticsearch. If youre on Python or PHP, youll probably be fine with either.

Something to add about this: ES doesnt have a very elegant Java API IMHO (youll basically end up using REST because its less painful), whereas Solrj is very satisfactory and more efficient than Solrs REST API. If youre primarily a Java dev team, do take this into consideration for your sanity. Theres no scenario in which constructing JSON in Java is fun/simple, whereas in Python its absolutely pain-free, and believe me, if you have a non-trivial app, your ES json query strings will be works of art.

8. ES doesnt have in-built support for pluggable SearchComponents, to use Solrs terminology. SearchComponents are (for me) a pretty indispensable part of Solr for anyone who needs to do anything customized and in-depth with search queries.

Yes of course, in ES you can just implement your own RestHandler, but thats just not the same as being able to plug-into and rewire the way search queries are handled and parsed.

9. Whichever way you go, I highly suggest you choose a client library which is as close to the metal as you can get. Both ES and Solr have *really* simple search and updating search APIs. If a client library introduces an additional DSL layer in attempt to simplify, I suggest you think long and hard about using it, as its likely to complicate matters in the long-run, and make debugging and asking for help on SO more problematic.

In particular, if youre using Rails + Solr, consider using rsolr/rsolr
instead of sunspot/sunspot if you can help it. ActiveRecord is complex code and sufficiently magical. The last thing you want is more magic on top of that.

---

To conclude, ES and Solr have more or less feature-parity and from a feature standpoint, theres rarely one reason to go one way or the other (unless your app lives/breathes JSON). Performance-wise, they are also likely to be quite similar (Im sure there are exceptions to the rule. ES relatively new autocomplete implementation, for example, is a pretty dramatic departure from previous Lucene/Solr implementations, and I suspect it produces faster responses at scale).

ES does offer less friction from the get-go and you feel like you have something working much quicker, but I find this to be illusory. Any time gained in this stage is lost when figuring out how to properly configure ES because of poor documentation - an inevitablity when you have a non-trivial application.

Solr encourages you to understand a little more about what youre doing, and the chance of you shooting yourself in the foot is somewhat lower, mainly because youre forced to read and modify the 2 well-documented XML config files in order to have a working search app.

---

EDIT on Nov 2015:

ES has been gradually distinguishing itself from Solr when it comes to data analytics. I think its fair to attribute this to the immense traction of the ELK stack in the logging, monitoring and analytic space. My guess is that this is where Elastic (the company) gets the majority of its revenue, so it makes perfect sense that ES (the product) reflects this.

We see this manifesting primarily in the form of aggregations, which is a more flexible and nuanced replacement for facets. Read more about aggregations here: Migrating to aggregations

Aggregations have been out for a while now (since 1.4), but with the recently released ES 2.0 comes pipeline aggregations, which let you compute aggregations such as derivatives, moving averages, and series arithmetic on the results of other aggregations. Very cool stuff, and Solr simply doesnt have an equivalent. More on pipeline aggregations here: Out of this world aggregations

If youre currently using or contemplating using Solr in an analytics app, it is worth your while to look into ES aggregation features to see if you need any of it.

ElasticSearch_异常_01_org.elasticsearch.transport.ReceiveTimeoutTransportException 一、异常信息项目启动时 2018-04-17 16:32:16.496 INFO 15992 --- [ main] o.s.d.e.c.TransportClientFactoryBean : adding transport node : localhost:9300 2018-04-17 16:32:21.
ElasticSearch和solr的差别 Elasticsearch简介 Elasticsearch是一个实时分布式搜索和分析引擎。它让你以前所未有的速度处理大数据成为可能。它用于全文搜索、结构化搜索、分析以及将这三者混合使用：维基百科使用Elasticsearch提供全文搜索并高亮关键字，以及输入实时搜索(search-asyou-type)和搜索纠错(did-you-mean)等搜索建议功能。

猜你喜欢

【WPF】Viewbox标签——控件大小适应父容器
win10 UWP MessageDialog 和 ContentDialog
使用cpolar内网穿透发布树莓派网页：发布web网站 2/4
记一次Linux系统内存占用较高得排查
hdu2155 小黑的镇魂曲(dp)
清空 /var/log/journal 文件的方法
基于Python与命令行人脸识别项目（系列一）
List里如何剔除相同的对象？
使用SoapUI对WebService键名压力测试
2017华为软挑——最小费用最大流（MCMF）
HugeGraph图数据库--你可能还不知道的操作
Github入门教程
java.lang.Math中的基本方法
Apache Spark机器学习.1.6　机器学习工作流和Spark pipeline
如何使用RobotFramework编写好的测试用例
【BZOJ2973】石头游戏矩阵乘法
使用FMXlinux 开发linux 桌面应用
自我学习智能防护 “韧性网络”构筑未来信息安全
java面试
从零学Java（4）之编程规范
高数 | 含有第一类间断点和无穷间断点的函数f(z)在包含该间断点的区间内必没有原函数F(z).

相关主题

Apache重定向
LAMP之Apache
apache的Http请求
linux-安装apache
Apache虚拟主机
Apache Thrift
apache配置
Apache ZooKeeper
Apache solr(一).
配置Apache
Apache 安装配置
Apache httpclient
Apache日志
apache是什么
Apache Shiro 手册
Apache + PHP配置
apache 500错误
Apache 优化

zl程序教程

当前栏目

Apache Solr vs Elasticsearch-feature

相关文章