Apache Lucene is an open source and free text search engine library written in Java. It is a technology suitable for applications that requires full-text search, and is available cross-platform.
N/A
Azure AI Search
Score 7.9 out of 10
N/A
Azure AI Search (formerly Azure Cognitive Search) is enterprise search as a service, from Microsoft.
Apache Lucene offers great full-text search library that makes it easy to add search functionality to a website or other applications. Lucene is ideal if you want low-level access to the indexes and its APIs. For general purposes, Apache Solr, the web application built atop of Lucene can be used instead. Apache Solr comes with caching, HTTP/ JSON APIs and a simple web administration console.
If you have a medium amount of data (2GB - 2.4TB), high-security concerns, and search is a key requirement in your single-tenant application then Azure Search likely has you covered. If you have a small amount of data per tenant (EG, about 2GB), have low-security concerns, and a multi-tenant application where search is a key requirement, then Azure Search would likely be a good choice - though you would need to implement your own concept of sharding and managing across potentially multiple Azure Search instances. If you can reflect your would-be indexes in Azure Search by depositing the data in columns in a SQL table and just index it for full-text search - and that still fits your requirements - it's probably better to start with SQL Database then scale up to Azure Search when you need the advanced features like ranking or cognitive abilities.
We had difficulty porting the project to a cluster based environment on the cloud.
For our particular use case of retrieving documents based on text pattern matching, the program worked efficiently however, we did not find many resources for image pattern recognition based on their metadata.
Like virtually all Azure services, it has first-class treatment for .Net as the developer platform of choice, but largely ignores other options. While there is a first-party Python SDK, there are only community packages for other languages like Ruby and Node. Might be a game of roulette for those to be kept up-to-date. This might make it a non-starter for some teams that don't want to do the work to integrate with the REST API directly.
In my opinion, partitions inside of Azure Search don't count as data segregation for customers in a multi-tenant app, so any application where you have many customers with high-security concerns, Azure Search is probably a non-starter.
To elaborate on the multi-tenant issue: Azure Search's approach to pricing is pretty steep. While there is a free tier for small applications (50MB of content or less) the first paid tier is about 14x more expensive than the first SQL Database tier that supports full-text search. For many applications, it makes a lot more economic sense to just run some LIKE or CONTAINS queries on columns in a table rather than going with Azure Search.
The search and index performance of [Apache] Lucene is excellent and the quality of results is good, if not better. For implementing it with small scale applications it is a no brainer, Lucene is the best and most cost effective solution. Learning curve is not too steep either.
Azure Search is a competitor against Google's own AI autosuggest a feature. We went with Azure because our network security folks found it to be more robust from a security standpoint, which is incredibly important when you have proprietary manufacturing information. Additionally, we're a Microsoft shop so it plugged into our cloud hosting package and client facing OS.