How I fixed Elasticsearch - davidvgalbraith

After my uproarious success fixing node.js, I was ready to fix something else. I checked node.js’s open issues, but I didn’t see any that I’d be able to jump on. So I turned my attention elsewhere. Another technology I have a fair amount of experience with is Elasticsearch – I wrote the Juttle Elastic Adapter, after all.

I moseyed on over to the Elasticsearch issues list, where I found issue 15381: Config index.mapper.dynamic:false is not honored. I didn’t know what that meant, but the issue had the “adoptme” and “low-hanging fruit” labels, so it seemed like a good place to start contributing to Elasticsearch.

Elasticsearch

Elasticsearch is a search engine (actually, it’s an HTTP API for the Lucene search engine). The user gives Elasticsearch a set of JSON objects to store, and Elasticsearch creates a data structure called the inverted index that enables searching the fields of those objects. For each key in the data set, the inverted index maps each value for that key to the list of objects where that key has that value.

For example, let’s say we have the two JSON objects:

[{"group": "fellowship", "name": "Frodo", "task": "ringbearer", "_id": 1},
 {"group": "fellowship", "name": "Gandalf", "task": "wizard", "_id": 2}]

The _id field is a special field for Elasticsearch. Every object has to have a unique _id. That’s how Elasticsearch identifies an object. If you try to store two objects with the same _id, the second will overwrite the first. Elasticsearch can generate unique _ids if you don’t want to compute them yourself. The inverted index for these two looks like this:

{
 "group": {
   "fellowship": [1, 2]
 },
 "name": {
   "Frodo": [1],
   "Gandalf": [2]
 },
 "task": {
   "ringbearer": [1],
   "wizard": [2]
 }
}

With a data structure like that, Elasticsearch can quickly answer queries like “give me all the objects where group = ‘fellowship'”. Such a query just turns into a lookup of the group key to get the available groups, then a lookup of the fellowship nested key to get the requested objects.

The `index.mapper.dynamic` Setting

Each top-level key (“group”, “name”, and “task” in our example) is called a mapping. By default, Elasticsearch will add mappings to the inverted index as necessary when new objects are stored. For instance, if I add {"name": "Boromir", "hometown": "Minas Tirith", "_id": 3} to my sample data set, then the mapping for hometown will be added to the inverted index. This process is known as dynamic mapping creation.

Sometimes, applications might want to disable dynamic mapping creation, forcing all mappings to be created manually. Then, to store Boromir, I’d need to first manually add the hometown mapping via the put mapping API. To disable dynamic mapping creation, place index.mapper.dynamic: false in Elasticsearch’s configuration file.

Once upon a time, Jut was going to add a bunch of extra type checking before storing objects in Elasticsearch, so we were going to use this setting in production. We didn’t end up implementing that idea, but I gained enough experience with the index.mapper.dynamic setting to know what it is for and why it is useful.

The Bug

However, as issue 15381 pointed out, Elasticsearch was ignoring index.mapper.dynamic in one scenario: index creation. An index is the highest level of data organization in Elasticsearch, similar to an SQL table. Each index has its own mappings and inverted index like the one I described above.

When storing an object, you tell Elasticsearch what index to store that object in. If the specified index doesn’t already exist, Elasticsearch will create it. A new index has no mapping, so a mapping has to be created before the index can store any objects. Ordinarily, Elasticsearch dynamically creates the mapping. If dynamic mapping creation is disabled, though, it shouldn’t be doing that. But it did anyway! Even with dynamic mapping creation disabled, Elasticsearch was dynamically creating mappings for new indices. Basically, Elasticsearch was broken.

The Search

I had done some cursory investigations into the internals of Elasticsearch for Jut, but most of the codebase was still a mystery. I started my quest by searching the code for the string "index.mapper.dynamic". I figured I’d see the sites where the setting was being used, so I could plug it into the places where it needed to be. I found just one reference to it outside of tests, in the file MapperService.java.

The MapperService is the class that handles creation and maintenance of mappings, so it makes sense for it to have a reference to index.mapper.dynamic. MapperService’s constructor sets its dynamic property to the value of index.mapper.dynamic. The method documentMapperWithAutoCreate uses the dynamic value:

public DocumentMapperForType documentMapperWithAutoCreate(String type) {
    DocumentMapper mapper = mappers.get(type);
    if (mapper != null) {
        return new DocumentMapperForType(mapper, null);
    }
    if (!dynamic) {
        throw new TypeMissingException(index(), type, "trying to auto create mapping, but dynamic mapping is disabled");
    }
    mapper = parse(type, null, true);
    return new DocumentMapperForType(mapper, mapper.mapping());
}

With index.mapper.dynamic false, that TypeMissingException shows up in all the cases where a mapping might be dynamically created, except index creation. It seemed like the index creation process was creating a mapping without first calling documentMapperWithAutoCreate.

Continuing to look around MapperService.java, I found that the real work of creating the mapping is done in its merge method. So if index creation called MapperService.merge without calling documentMapperWithAutoCreate, that would create a mapping without ever checking the value of index.mapper.dynamic.

I then searched the code for all the calls to MapperService.merge. The one that stuck out to me was in the execute method of MetadataCreateIndexService.java. That method is over 200 lines, but here are the highlights:

Map<String, Map<String, Object>> mappings = new HashMap<>();

for (Map.Entry<String, String> entry : request.mappings().entrySet()) {
    mappings.put(entry.getKey(), parseMapping(entry.getValue()));
}

indicesService.createIndex(nodeServicesProvider, tmpImd, Collections.emptyList());

IndexService indexService = indicesService.indexServiceSafe(request.index());

MapperService mapperService = indexService.mapperService();

for (Map.Entry<String, Map<String, Object>> entry : mappings.entrySet()) {

    mapperService.merge(entry.getKey(), new CompressedXContent(XContentFactory.jsonBuilder().map(entry.getValue()).string()), true, request.updateAllTypes());

}

The method analyzes the request to figure out the mappings it needs to add. In our case, it needs to add a mapping for the fields in the object it’s creating the index for. Once it has this info in the mappings object, it just creates the index and adds the mappings, willy-nilly! It seemed like we needed to check the index.mapper.dynamic setting here before proceeding with the mapping creation.

The Fix

I added a method to MapperService that I called checkIfAddMappingAllowed. It just checked the value of index.mapper.dynamic and threw an error if it was false. Prior to MetadataCreateIndexService.execute‘s call to MapperService.merge, I added a call to MapperService.checkIfAddMappingAllowed. When I tried to create an index with dynamic mappings after this change, checkIfAddMappingAllowed threw an exception, preventing the mapping creation. Enthused, I added an automatic test, assembled a pull request with the changes, and awaited the community’s feedback.

The Fix for the Fix

Two days later, the reviews were in — my change was a disaster! It turns out that MetadataCreateIndexService.execute is called both during dynamic index creation and during manual index creation. So with my change, if index.mapper.dynamic were set to false, it would be impossible to create any indices. I had gone a step too far: to keep Elasticsearch from creating indices in the case where I didn’t want it to, I’d prevented it from creating indices at all. Oops.

The reviewers helpfully pointed me towards a better place for the change, a class called TransportIndexAction.java. TransportIndexAction is the class that manages requests to store objects. The main action happens in its doExecute method:

protected void doExecute(final IndexRequest request, final ActionListener<IndexResponse> listener) {
    ClusterState state = clusterService.state();
    if (autoCreateIndex.shouldAutoCreate(request.index(), state)) {
        CreateIndexRequest createIndexRequest = new CreateIndexRequest(request);
        createIndexRequest.index(request.index());
        createIndexRequest.mapping(request.type());
        createIndexRequest.cause("auto(index api)");
        createIndexAction.execute(createIndexRequest, new ActionListener<CreateIndexResponse>() {
            @Override
                public void onResponse(CreateIndexResponse result) {
                innerExecute(request, listener);
            }
        });
    } else {
        innerExecute(request, listener);
    }
}

doExecute determines whether an index needs to be created and creates the necessary index if so. Then the innerExecute method actually stores the given object in that index. The details of determining whether an index needs to be created are handled by the class AutoCreateIndex, in the method shouldAutoCreate.

shouldAutoCreate takes an index name as an argument in addition to an object that describes the current state of Elasticsearch. If, based on this information, the index should be automatically created, shouldAutoCreate returns true. So in the doExecute method of TransportIndexAction, in the case where shouldAutoCreate returns true, I added a call to my checkIfAddMappingAllowed function. With the check there, it could only block dynamic index/mapping creation, leaving manual index/mapping creation alone. I pushed up this change and awaited further feedback from the community.

A few days later, the reviews were in again — my new change wasn’t a disaster, but it could still be improved. The project owner jpountz suggested that I move the check into AutoCreateIndex itself, so that any class that used AutoCreateIndex.shouldAutoCreate would not automatically create indices if dynamic mapping creation were disabled. This included not just TransportIndexAction but also TransportBulkAction, TransportDeleteAction, TransportSupportAction, and TransportUpdateAction, the facilitators of various other Elasticsearch APIs. You get a lot of Action when hacking on Elasticsearch.

With the check in AutoCreateIndex instead of just TransportIndexAction, none of these Action-backed APIs would inappropriately create indices. So I added a boolean dynamicMappingDisabled to AutoCreateIndex and initialized it to the opposite of the index.mapper.dynamic setting. Then I made shouldAutoCreate check the value of dynamicMappingDisabled and return false if dynamic mapping is disabled. This makes TransportIndexAction not create any indices or mappings: it just calls innerExecute, and if the index it’s writing to doesn’t exist already, it fails. With AutoCreateIndex aware of the index.mapper.dynamic setting, all the relevant APIs have the same behavior.

Soon enough, the reviews were in one more time — my change looked great! After a little cleanup, my commit landed in the master branch. I had fixed Elasticsearch!

Profuse thanks to my code reviewers jpountz, jasontedor, and jimferenczi for their patient guidance as I stumbled through the wilds of Elasticsearch. Fixing this bug was a great experience, and I’ll be watching the issue tracker for any I can jump on in the future!