Skip to content

Conversation

@davidclement90
Copy link
Contributor

Issue : #222

Signed-off-by: David Clement david.clement90@laposte.net

@janusgraph-bot
Copy link

Committer of one or more commits is not listed as a CLA signer, either individual or as a member of an organization.

@janusgraph-bot janusgraph-bot added cla: no This PR is not compliant with the CLA cla: yes This PR is compliant with the CLA and removed cla: no This PR is not compliant with the CLA labels Apr 19, 2017
@sjudeng
Copy link
Contributor

sjudeng commented Apr 25, 2017

It doesn't look like support has been added for setting the custom analyzer, just leveraging it in queries. I think you could support setting the custom analyzer by adding a new "analyzer" parameter to ParameterType.java and then updating ElasticSearchIndex.java#register to support same (see example).

If nothing else this would allow a cleaner test implementation. Though it would also probably require some documentation updates to show an example of setting the custom analyzer. But I think adding this would make a more complete implementation. What do you think?

@davidclement90
Copy link
Contributor Author

You are right, this PR only allow to leveraging custom analyser in queries for Equals, NotEquals and Contains predicates. But it a first step.
To use custom analyser, I use #233 (that I just backport from my Titan) and I manage my mapping in Kibana.
I tried to adapt ParameterType but to declare new analyser in ES you usually need to update index settings.
But an update of analyser in setting has to be done on closed index. So it can not be register easily, thus I use directly Kibana.
I think that to adapt ParmeterType, we need to first separate index (instead to have one index with multiple types that is not really the best use of ES elastic/elasticsearch#15613, JanusGraph will have multiple index) which allow close index more easily. Then adapt ParameterType.

What do you think ?

@sjudeng
Copy link
Contributor

sjudeng commented Apr 26, 2017

The idea of supporting an external mapping is interesting, I'll have to look over #233.

Regarding this PR can the custom analyzer be set during initial mapping creation? If so I think there'd still be value in supporting this using the existing machinery within JanusGraph (e.g. ParameterType through ElasticSearchIndex.java#register). For one I think this would make the feature more accessible for user's who don't want to go deeper into ES mapping details. Also we could eventually look at adding corresponding support to the other indexing backends.

I don't think updating the analyzer needs to be supported (same is already true of field type, etc. defined in ElasticSearchIndex.java#register). But can you explain more on why this would require a separate index instead of type for each property? I'm just seeing something like the following added to the TEXT and TEXTSTRING cases in the above register method. Or am I missing something?

String analyzer = (String) ParameterType.ANALYZER.findParameter(information.getParameters(), null);
if (analyzer != null) mapping.field("analyzer", analyzer);

@davidclement90
Copy link
Contributor Author

To use custom analyzer like https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html, you need to declare it in index settings.
So this settings can be add when the index is created. If you want to add another one after the creation, you need to close so make it unavailable, update the setting then open it. As JanusGraph have one index with multiple types, this action make all mixed indexes unavailable. If you will have one ES index by mixed index, this update will not make all mixed indexes unavailable.
But it's another issue.
For this one, I just discover the ES_CREATE_EXTRAS_NS properties, i can use it to register all my custom analyser when JanusGraph will push my index.
So you are right, I currently work to add a new ParameterType and I also try to implement this feature on other indexing backends.

@sjudeng
Copy link
Contributor

sjudeng commented Apr 27, 2017

How about just adding support for setting the built-in analyzer to use (e.g. english, stop, etc.) and save the case of defining a truly custom analyzer for a separate feature (or #233)? This is what you're testing anyway (e.g. setting analyzer to english on text field) so I think it would be enough just to support setting that through JanusGraph, which would avoid need for custom HTTPClient and the index create-delete-recreate steps in the test.

@davidclement90 davidclement90 changed the title Use match instead of term for Equals, NotEquals and Contains predicates in ES. Allows to use custom analysers in ES or Solr Apr 28, 2017
@davidclement90
Copy link
Contributor Author

I do the change. I can implement it in ES and Solr (on Lucene is not possible).
So we can set analyzer for Mapping.STRING and Mapping.Text field with a separated ParemeterType.
To use truly custom analyzer, you are right , we should use #233.

Copy link
Contributor

@sjudeng sjudeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the questions. I think this is much better. Some more minor feedback is below.


* Please refer to the https://www.elastic.co[Elasticsearch homepage] and available documentation for more information on Elasticsearch and how to setup an Elasticsearch cluster.

=== Custom Analyzer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't include this section here since you already provide this in the textsearch page below and this doesn't seem to fit here. Instead how about adding a bullet "Analyzer" or "Text Analyzer" or something, with a sentence or two description, to the list at the top of this page ("Full Text", "Geo", etc.)?

docs/solr.txt Outdated
///////////


==== Custom Analyzer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above ... consider adding a bullet to list at the top instead of this section here.


==== Custom Analyser

By default, JanusGraph will use the default analyzer from the indexing backend for properties with mapping.TEXT, and no analyzer for properties with mapping.STRING. If one wants to use another analyzer, it can be explicitly specified through a parameter : ParameterType.TEXT_ANALYZER for mapping.TEXT and parameterType.STRING_ANALYZER for mapping.STRING.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization fixes: Mapping.STRING, Mapping.TEXT, ParameterType.STRING_ANALYZER, etc.


===== For Elasticsearch

The name of the analyzer must be set as parameter value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a "more information" link to ES documentation on analyzers?

"which typically only happens the first time JanusGraph is started on top of ES. If the index JanusGraph is " +
"configured to use already exists, then this setting has no effect.", ConfigOption.Type.MASKABLE, 200L);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove whitespace

try {
((Constructor<Tokenizer>) ClassLoader.getSystemClassLoader().loadClass(analyzer)
.getConstructor()).newInstance();
} catch (InstantiationException | IllegalAccessException | IllegalArgumentException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify by using base ReflectiveOperationException (e.g. catch (ReflectiveOperationException | IllegalArgumentException | SecurityException e))?

try {
((Constructor<Tokenizer>) ClassLoader.getSystemClassLoader().loadClass(analyzer)
.getConstructor()).newInstance();
} catch (InstantiationException | IllegalAccessException | IllegalArgumentException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch (ReflectiveOperationException | IllegalArgumentException | SecurityException e)

terms.add(termAtt.getBytesRef().utf8ToString());
}
return terms;
} catch (InstantiationException | IllegalAccessException | IllegalArgumentException | InvocationTargetException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch (ReflectiveOperationException | IllegalArgumentException | SecurityException e)

}
}
//Since all data types must be defined in the schema.xml, pre-registering a type does not work
//But we check Analyse feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check here necessary since it looks like errors would be handled in your customTokenize method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fail during the property registration and not during the search. So the index will not have property with bad configuration.
I think if the configuration is wrong, JanusGraph should fail as soon as possible.

put(TEXT,new StandardKeyInformation(String.class, Cardinality.SINGLE, new Parameter("mapping",
indexFeatures.supportsStringMapping(Mapping.TEXT)?Mapping.TEXT:Mapping.TEXTSTRING)));
put(TEXT,new StandardKeyInformation(String.class, Cardinality.SINGLE,
indexFeatures.supportsStringMapping(Mapping.TEXT)?Mapping.TEXT.asParameter():Mapping.TEXTSTRING.asParameter()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you cleaned these statements up. Going a little further how about (indexFeatures.supportsStringMapping(Mapping.TEXT)?Mapping.TEXT:Mapping.TEXTSTRING).asParameter())?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, is it OK to glue text to ?: ?

@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label May 6, 2017
@davidclement90 davidclement90 force-pushed the elasticsearch-match branch 3 times, most recently from 3f01044 to aece87e Compare May 10, 2017 12:00
Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

this.defaultStringMapping = defaultMap;
this.supportedStringMappings = supportedMap;
this.wildcardField = wildcardField;
this.supportedCardinaities = supportedCardinaities;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix spelling: supportedCardinalities

private final ImmutableSet<Mapping> supportedStringMappings;
private final String wildcardField;
private final boolean supportsNanoseconds;
private final boolean supportCustomAnalyser;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor comments here

  1. please use the same naming convention as supportsNanoseconds
  2. Elasticsearch spelling here and elsewhere: supportsCustomAnalyzer

return this;
}

public Builder supportCustomAnalyser() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supportsCustomAnalyzer

private Set<Cardinality> supportedCardinalities = Sets.newHashSet();
private String wildcardField = "*";
private boolean supportsNanoseconds;
private boolean supportCustomAnalyser;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supportsCustomAnalyzer

this.wildcardField = wildcardField;
this.supportedCardinaities = supportedCardinaities;
this.supportsNanoseconds = supportsNanoseconds;
this.supportCustomAnalyser = supportCustomAnalyser;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supportsCustomAnalyzer

case STRING:
mapping.field("index","not_analyzed");
if (stringAnalyzer != null) {
mapping.field("analyzer", stringAnalyzer);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant

break;
case TEXTSTRING:
if (textAnalyzer != null) {
mapping.field("analyzer",textAnalyzer);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant

mapping.field("type", "string");
mapping.field("index","not_analyzed");
if (stringAnalyzer != null) {
mapping.field("analyzer", stringAnalyzer);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant

if (stringAnalyzer != null) {
mapping.field("analyzer", stringAnalyzer);
}else{
mapping.field("index","not_analyzed");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalize index to a string constant

if (stringAnalyzer != null) {
mapping.field("analyzer", stringAnalyzer);
} else {
mapping.field("index","not_analyzed");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constant

Custom analyzers can be set throw new ParameterType.

Signed-off-by: David Clement <david.clement90@laposte.net>
@sjudeng
Copy link
Contributor

sjudeng commented May 23, 2017

@amcp Do the changes look okay here? Once this can get merged we can request #233 be rebased and then I'd like to add my review to it as well.

@amcp
Copy link

amcp commented May 26, 2017

@sjudeng checking now

Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more minor changes please

}

public Builder supportsCustomAnalyser() {
supportsCustomAnalyser = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ES spelling: supportsCustomAnalyzer

return this;
}

public Builder supportsCustomAnalyser() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ES spelling: supportsCustomAnalyzer

private Set<Cardinality> supportedCardinalities = Sets.newHashSet();
private String wildcardField = "*";
private boolean supportsNanoseconds;
private boolean supportsCustomAnalyser;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ES spelling: supportsCustomAnalyzer


private static final IndexFeatures ES_FEATURES = new IndexFeatures.Builder()
.setDefaultStringMapping(Mapping.TEXT).supportedStringMappings(Mapping.TEXT, Mapping.TEXTSTRING, Mapping.STRING).setWildcardField("_all").supportsCardinality(Cardinality.SINGLE).supportsCardinality(Cardinality.LIST).supportsCardinality(Cardinality.SET).supportsNanoseconds().build();
.setDefaultStringMapping(Mapping.TEXT).supportedStringMappings(Mapping.TEXT, Mapping.TEXTSTRING, Mapping.STRING).setWildcardField("_all").supportsCardinality(Cardinality.SINGLE).supportsCardinality(Cardinality.LIST).supportsCardinality(Cardinality.SET).supportsNanoseconds().supportsCustomAnalyser().build();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ES spelling: supportsCustomAnalyzer

mapping.field("index","not_analyzed");
if (stringAnalyzer != null) {
mapping.field(ANALYZER, stringAnalyzer);
}else{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing should be } else {

//Since all data types must be defined in the schema.xml, pre-registering a type does not work
//But we check Analyse feature
String analyzer = (String) ParameterType.STRING_ANALYZER.findParameter(information.getParameters(), null);
if (analyzer !=null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after !=

}
if (janusgraphPredicate == Text.PREFIX || janusgraphPredicate == Text.CONTAINS_PREFIX) {
return tokenize(informations, value, key, janusgraphPredicate, (String) ParameterType.TEXT_ANALYZER.findParameter(informations.get(key).getParameters(), null));
}else if (janusgraphPredicate == Text.PREFIX || janusgraphPredicate == Text.CONTAINS_PREFIX) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after the {

String tokenizer = (String) ParameterType.STRING_ANALYZER.findParameter(informations.get(key).getParameters(), null);
if(tokenizer != null){
return tokenize(informations, value, key, janusgraphPredicate,tokenizer);
}else{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spacing please } else {

List<String> terms;
if(tokenizer != null){
terms = customTokenize(tokenizer, (String) value);
}else{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spacing please `} else {

return terms;
} catch ( ReflectiveOperationException | IOException e) {
throw new IllegalArgumentException(e.getMessage(),e);
} finally{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after finally

@amcp
Copy link

amcp commented May 26, 2017

@sjudeng I found a few more things to fix, so one more round. Thanks.

b.must(QueryBuilders.termQuery(fieldName, term));
}
return b;
if (janusgraphPredicate == Text.CONTAINS ||janusgraphPredicate == Cmp.EQUAL ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space after || and remove space before )

return QueryBuilders.termQuery(fieldName, (String) value);
} else if (janusgraphPredicate == Cmp.NOT_EQUAL) {
return QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery(fieldName, (String) value));
return QueryBuilders.boolQuery().mustNot(QueryBuilders.matchQuery(fieldName, value).operator(Operator.AND));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra spaces

Signed-off-by: sjudeng <sjudeng@users.noreply.github.com>
@sjudeng sjudeng force-pushed the elasticsearch-match branch from 8f715d1 to 0f822c7 Compare May 27, 2017 21:15
@sjudeng
Copy link
Contributor

sjudeng commented May 27, 2017

@davidclement90 I pushed a commit to your branch with the requested code style updates.

@amcp Can you check this over again when you have time? I'm working on a separate update to ElasicSearchIndex which this PR is blocking because of conflicts, so I'd like to get it merged if possible.

@amcp amcp merged commit 2ec94a3 into JanusGraph:master May 28, 2017
@amcp
Copy link

amcp commented May 28, 2017

@sjudeng merged this in, you are good to work on the other PR now.

@davidclement90 davidclement90 deleted the elasticsearch-match branch June 7, 2017 11:50
bwatson-rti-org pushed a commit to bwatson-rti-org/janusgraph that referenced this pull request Mar 9, 2019
…atch

Allows to use custom analysers in ES or Solr
micpod pushed a commit to micpod/janusgraph that referenced this pull request Nov 5, 2019
…atch

Allows to use custom analysers in ES or Solr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This PR is compliant with the CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants