-
Notifications
You must be signed in to change notification settings - Fork 1
Index
The Esse::Index class an abstraction of an Elasticsearch index. It's responsible for defining the index name, the index settings, the index mappings, datasources and its documents.
Here is an minimal example of an index:
class ArticlesIndex < Esse::Index
repository :article do
collection do |**context, &block|
batch = [
{ id: 1, title: 'Article 1', body: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.' },
{ id: 2, title: 'Article 2', body: 'Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.' },
]
batch.delete_if { |item| item[:id] != context[:id] } if context[:id] # Just to simulate a filter
block.call(batch, **context)
end
document do |item, **_context|
{
_id: item[:id], # The _id is a convention to define the document id. More on this later.
title: item[:title],
body: item[:body],
}
end
end
endNow, let's see what's happening here:
- The
ArticlesIndexclass inherits fromEsse::Indexand defines arespositoryblock. - The
respositoryblock defines a new repo identified by:articlewith acollectionand adocument. - The
collectionblock is responsible for fetching data from a datasource. It may receive acontextthat can be used to filter the data and ablockthat must be called with the fetched data - The
documentblock is responsible for transforming each item of collection into a Esse::Document. Note that we are using aHashas a document to keep things simpler, but under the hood, it will be converted to a genericEsse::HashDocumentobject. Always prefer to implement your ownEsse::Documentclass.
> ArticlesIndex.documents
=> #<Enumerator: ...>
> ArticlesIndex.documents.to_a
=> [
#<Esse::HashDocument @object={:_id=>1, :title=>"Article 1", :body=>"Lorem ipsum dolor sit amet, consectetur adipiscing elit."}, @options={}>,
#<Esse::HashDocument @object={:_id=>2, :title=>"Article 2", :body=>"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium."}, @options={}>
]
> ArticlesIndex.documents(id: 1).to_a
=> [
#<Esse::HashDocument @object={:_id=>1, :title=>"Article 1", :body=>"Lorem ipsum dolor sit amet, consectetur adipiscing elit."}, @options={}>
]Now let's go deeper in each part of the index definition.
The repository is used to define a data source for the index. It can be a database table, a file, a web service, etc. The repository is responsible for fetching the data, enriching it and transforming it into documents. One index can have multiple repositories.
Defining a repository with a block:
class GeosIndex < Esse::Index
repository :county do
# ...
end
repository :city do
# ...
end
endThe identifier of the repository must be unique within the index. As default, a constantized version of the identifier will be used as the repository class name. In the example above, the :county repository will be represented by the GeosIndex::County class and the :city repository will be represented by the GeosIndex::City class. You can also access the repositories using GeosIndex.repo method:
> GeosIndex.repo(:county) == GeosIndex::County
=> true
> GeosIndex.repo_hash
=> {"county"=>GeosIndex::County, "city"=>GeosIndex::City}If you don't want to generate the repo constant, you can pass const: false to the repository method:
class GeosIndex < Esse::Index
repository :county, const: false do
# ...
end
end
GeosIndex.constants.include?(:County)
=> falseThe collection block is responsible for fetching the data from the datasource. It must receive a context keyword-arguments and a block that must be called with the fetched data. The context can be anything you want, but it's important to implement :id filter to fetch a single document.
A collection can be defined through a block or a class that implements the Enumerable interface.
# app/indices/geos_index.rb
class GeosIndex < Esse::Index
repository :county do
collection do |**context, &block|
# ...
end
end
repository :city do
collection Collections::CityCollection
end
end
# app/indices/geos_index/collections/city_collection.rb
class GeosIndex::Collections::CityCollection
include Enumerable
# @param [Hash] context
def initialize(**context)
@context = context
end
# @yield [Array<Object>] batch of objects
def each(&block)
# ...
end
endThe document block is responsible for coerce each item of collection into a Esse::Document. It will always receive each item of the collection and a context keyword-arguments. The context can be anything you want, apply filters, policies, etc.
A document can be defined through a block or a class that implements the Esse::Document interface.
# app/indices/geos_index.rb
class GeosIndex < Esse::Index
repository :county do
document do |item, **context|
{ _id: item.id, name: item.name }
end
end
repository :city do
document Documents::CityDocument
end
end
# app/indices/geos_index/documents/city_document.rb
class GeosIndex::Documents::CityDocument < Esse::Document
# @return [String]
def id
object.id
end
# @return [Hash]
def source
# You can access the context using the `options` method
{ name: object.name }
end
endElasticsearch 5.x or lower requires a type to be defined for each document. You can define the type using the #type method or by rendering _type: 'doc_type' in hash documents.
In the document level you can also define the #routing method to define the routing of the document.
Please look at the Esse::Document source code to see all the methods you can override.
As most of Ruby applications are built on Rails, I'm going to show how to create a Esse Index loading data from an ActiveRecord model.
# app/indices/geographies_index.rb
class GeographiesIndex < Esse::Index
repository :city do
collection do |**context, &block|
query = ::City.includes(:state)
query = query.where(id: context[:id]) if context[:id]
query = query.where(state_abbr: context[:state_abbr]) if context[:state_abbr]
query.find_in_batches(&block)
end
document do |city, **_context|
{
_id: city.id,
name: city.name,
state: {
id: city.state.id,
name: city.state.name,
}
}
end
end
endBut thanks to the plugin system, we can use the esse-active_record and simplify the implementation above with a few lines of code:
# app/indices/geographies_index.rb
class GeographiesIndex < Esse::Index
plugin :active_record
repository :city do
collection ::City.includes(:state) do
scope :state_abbr, ->(abbr) { where(state_abbr: abbr) }
end
document Documents::CityDocument
end
endMuch better, huh? The esse-active_record plugin will automatically create a collection block. You can define multiple scopes to handle the context filters. There is also a pretty nice feature named batch_context that can be useful to preload associations. Please refer to the esse-active_record documentation for more details and more examples.
The index settings are responsible for defining the index settings It can be defined using the settings method. The settings method accepts a block or a Hash as argument.
class ArticlesIndex < Esse::Index
settings number_of_shards: 2, number_of_replicas: 1
end
# or
class ArticlesIndex < Esse::Index
settings do
# Usefull when you need to define dynamic settings
{
number_of_shards: 2,
number_of_replicas: 1,
}
end
endIf you want something more complex, you can pass as argument any object. The object must respond to #to_h and return a Hash with the settings definition.
Note that the settings can also be defined in the Esse.config.custer. The global settings will be deep merged with the settings defined in the index.
# config/initializers/esse.rb
Esse.configure do |config|
config.cluster do |cluster|
cluster.settings = {
number_of_shards: 2,
number_of_replicas: 0,
refresh_interval: '30s',
}
end
end
# app/indices/articles_index.rb
class ArticlesIndex < Esse::Index
settings number_of_replicas: 1
end
ArticlesIndex.settings_hash
# => {:settings=>{:number_of_shards=>2, :number_of_replicas=>1, :refresh_interval=>"30s"}}The index mappings are responsible for defining the index mappings It can be defined using the mappings method:
class ArticlesIndex < Esse::Index
mappings do
{
properties: {
title: { type: 'text' },
body: { type: 'text' },
}
}
end
endIf you want something more complex, you can pass as argument any object. The object must respond to #to_h and return a Hash with the mappings definition.
Note that the mappings can also be defined in the Esse.config.custer. The global mappings will be deep merged with the mappings defined in the index.
# config/initializers/esse.rb
Esse.configure do |config|
config.cluster do |cluster|
cluster.mappings = {
dynamic_templates: [
{
strings_as_keywords: {
mapping: {
ignore_above: 1024,
type: 'keyword',
},
match_mapping_type: 'string',
},
},
],
properties: {
created_at: { type: 'date' },
},
}
end
end
# app/indices/articles_index.rb
class ArticlesIndex < Esse::Index
mappings do
{
properties: {
title: { type: 'text' },
body: { type: 'text' },
}
}
end
end
ArticlesIndex.mappings_hash
# => {:mappings=>
# {:dynamic_templates=>[{:strings_as_keywords=>{:mapping=>{:ignore_above=>1024, :type=>"keyword"}, :match_mapping_type=>"string"}}],
# :properties=>{:created_at=>{:type=>"date"}, :title=>{:type=>"text"}, :body=>{:type=>"text"}}}}If you are working with elasticsearch 5.x and lower, you must define a type in the mappings' properties. Please adjust accordingly your needs.
The index name is defined by the class name. The ArticlesIndex class will be represented by the articles index. If you want to change the index name, you can use the index_name= method:
class ArticlesIndex < Esse::Index
self.index_name = 'my_articles'
endThis gem uses a combination of index_prefix + index_name + index_suffix to define the real index name. And the alias is defined by the index name without the suffix. This is useful to implement a zero-downtime deployment strategy. Please refer to the (https://www.elastic.co/blog/changing-mapping-with-zero-downtime) article for more details.
-
index_prefix is a prefix that can be defined in the
Esse.config.custer.index_prefix=. It's useful to separate the indexes by environment. -
index_name is the index name defined by the class name. The
ArticlesIndexclass will be represented by thearticlesindex. -
index_suffix is automatically generated with a current timestamp in the
%Y%m%d%H%M%Sformat. Unless you define a hardcodedindex_suffix=in the index class.
Here is an example of how the index name is generated:
> Esse.config.cluster.index_prefix = 'esse'
=> "esse"
> ArticlesIndex.index_name
=> "esse_articles"
> ArticlesIndex.index_name(suffix: 'v2')
=> "esse_articles_v2"The suffix parameter is available in most of the methods that interact with the index. Including the CLI commands. This is useful for long running tasks that can be performed in the background without affecting the current index.
Let's say you have a braking change in the index mappings and you need to reindex all the documents. You can create a new index with the new mappings and reindex all the documents from the old index to the new one. When the reindex is done, you can switch the alias to the new index and delete the old one. This way, you can perform the reindex without affecting the current index.
# Create initial index
> ArticlesIndex.create_index(alias: true)
=> {"acknowledged"=>true, "shards_acknowledged"=>true, "index"=>"esse_articles_20230926105739"}
> ArticlesIndex.indices_pointing_to_alias
=> ["esse_articles_20230926105739"]
# Let's say you have a braking change in the index mappings and you need to reindex all the documents.
> suffix = Esse.timestamp
=> "20230926105811"
> ArticlesIndex.create_index(suffix: suffix, alias: false)
=> {"acknowledged"=>true, "shards_acknowledged"=>true, "index"=>"esse_articles_20230926105811"}
> ArticlesIndex.indices_pointing_to_alias
=> ["esse_articles_20230926105739"]
> ArticlesIndex.import(suffix: suffix)
=> 231
# Now you can switch the alias to the new index and delete the old one.
> ArticlesIndex.update_aliases(suffix: suffix)
=> {"acknowledged"=>true}
> ArticlesIndex.indices_pointing_to_alias
=> ["esse_articles_20230926105811"]
# And finally delete the old index
> ArticlesIndex.delete_index(suffix: "20230926105739")
=> {"acknowledged"=>true}You can do it in the CLI too, but it's covered in the CLI section.
Make sure to check the Operations wiki page for more details about the operations you can perform in the index.