-
Notifications
You must be signed in to change notification settings - Fork 8
HowToDebugMappings
S-Match is not perfect. Sometimes it returns mappings which are incorrect and sometimes it does not return mappings one would expect. If you have a mapping which you deem incorrect, you can always debug the source code, but in some cases there might be a shorter path - by enabling and studying debug output. The following might help you understand where the problem comes from.
Most frequently mistakes originate in preprocessing or element-level matching. To debug your mapping, we suggest to
- isolate the mapping
- enable debug output
- run default matching in synchronous mode
- study debug output
Lets follow the steps to debug one mapping out of default cw example.
If you match context with several nodes, the debug output can be overwhelming. We suggest debugging one mapping at a time. Leave one node on the left side, one node on the right side and run the matching. For example, suppose you isolate a node in the source context cd.txt
Courses
College of Arts and Sciences
Earth and Atmospheric Sciences
and a node in the target context wd.txt
Course
College of Arts and Sciences
Earth Sciences
Geophysics
To enable debug output for S-Match one needs to configure Log4J. Out of many possible ways to do it, one can create a conf/log4j-debug.properties file, which enables TRACE logging level for all S-Match components:
log4j.rootLogger=INFO, C
# Configure logging for component packages
log4j.logger.net.sf.extjwnl=INFO, C
log4j.additivity.net.sf.extjwnl=false
log4j.logger.it.unitn.disi.smatch=TRACE, C
log4j.additivity.it.unitn.disi.smatch=false
# C – root console appender
log4j.appender.C=org.apache.log4j.ConsoleAppender
log4j.appender.C.layout=org.apache.log4j.PatternLayout
log4j.appender.C.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %t %p %c{1} - %m%nand specify this configuration in the command line script bin/cw-debug.bat:
call match-manager.bat convert ..\test-data\cw\cd.txt ..\test-data\cw\cd.xml -config=..\conf\s-match-Tab2XML.xml
call match-manager.bat convert ..\test-data\cw\wd.txt ..\test-data\cw\wd.xml -config=..\conf\s-match-Tab2XML.xml
SET JAVA_OPTS=-Dlog4j.configuration=..\conf\log4j-debug.properties
call match-manager.bat offline ..\test-data\cw\cd.xml ..\test-data\cw\cd.xml -config=..\conf\s-match-synchronous.xml
call match-manager.bat offline ..\test-data\cw\wd.xml ..\test-data\cw\wd.xml -config=..\conf\s-match-synchronous.xml
call match-manager.bat online ..\test-data\cw\cd.xml ..\test-data\cw\wd.xml ..\test-data\cw\result-cdwd.txt -config=..\conf\s-match-synchronous.xmlLine SET JAVA_OPTS=-Dlog4j.configuration=..\conf\log4j-debug.properties sets the location of the Log4J configuration for the command line utility.
By default S-Match runs in asynchronous mode, using multithreading. This leads to mixing of output lines in the log file. While this can be handled by separating output of different threads, it is easier to use single-threaded configuration. In shown above cw-debug.bat match manager is called with a -config=..\conf\s-match-synchronous.xml configuration, where S-Match works in single-threaded mode.
The usual suspects are preprocessing and element-level matching. The most important out of results of preprocessing are concept at label formulas. The debug output might look something like this:
INFO MatchManager - Preprocessing...
INFO MatchManager - Computing concepts at label...
TRACE DefaultContextPreprocessor - preprocessing: Courses
TRACE DefaultContextPreprocessor - preprocessing: College of Arts and Sciences
TRACE DefaultContextPreprocessor - preprocessing: Earth and Atmospheric Sciences
...
TRACE MatchManager - n1 College of Arts and Sciences n1_0 & n1_1 | n1_3
TRACE MatchManager - n1 0 college college
TRACE MatchManager - n1 0 n#8295090 [college] the body of faculty and students of a college
TRACE MatchManager - n1 0 n#8295245 [college] an institution of higher education created to educate and grant degrees; often a part of a university
TRACE MatchManager - n1 0 n#3073756 [college] a complex of buildings in which an institution of higher education is housed
TRACE MatchManager - n1 1 arts arts
TRACE MatchManager - n1 1 n#6163352 [humanistic discipline, humanities, liberal arts, arts] studies intended to provide general knowledge and intellectual skills (rather than occupational or professional skills); "the college of arts and sciences"
TRACE MatchManager - n1 3 science sciences
TRACE MatchManager - n1 3 n#6008975 [science, scientific discipline] a particular branch of scientific knowledge; "the science of genetics"
...
For example, in this case there is a mistake in logical formula:
College of Arts and Sciences n1_0 & n1_1 | n1_3
while the correct result should be
College of Arts and Sciences n1_0 & (n1_1 | n1_3)
Another common mistake at this step is a mistake in Word Sense Disambiguation (WSD). Lets consider Courses node
TRACE MatchManager - n0 Courses n0_0
TRACE MatchManager - n0 0 course courses
TRACE MatchManager - n0 0 n#886144 [course, course of study, course of instruction, class] education imparted in a series of lessons or meetings; "he took a course in basket weaving"; "flirting is not unknown in college classes"
TRACE MatchManager - n0 0 n#8393816 [course, line] a connected series of events or actions or developments; "the government took a firm course"; "historians can only point out those lines for which evidence is available"
TRACE MatchManager - n0 0 n#8698960 [course, trend] general line of orientation; "the river takes a southern course"; "the northeastern trend of the coast"
TRACE MatchManager - n0 0 n#39000 [course, course of action] a mode of action; "if you persist in that course you will surely fail"; "once a nation is embarked on a course of action it becomes extremely difficult for any retraction to take place"
TRACE MatchManager - n0 0 n#9410115 [path, track, course] a line or route along which something travels or moves; "the hurricane demolished houses in its path"; "the track of an animal"; "the course of the river"
TRACE MatchManager - n0 0 n#8255384 [class, form, grade, course] a body of students who are taught together; "early morning classes are always sleepy"
TRACE MatchManager - n0 0 n#7572535 [course] part of a meal served at one time; "she prepared a three course meal"
TRACE MatchManager - n0 0 n#3124680 [course, row] (construction) a layer of masonry; "a course of bricks"
TRACE MatchManager - n0 0 n#3124441 [course] facility consisting of a circumscribed area of land or water laid out for a sport; "the course had only nine holes"; "the course was less than a mile"
TRACE MatchManager - n0 0 v#2071468 [course] move swiftly through or over; "ships coursing the Atlantic"
TRACE MatchManager - n0 0 v#2070867 [run, flow, feed, course] move along, of liquids; "Water flowed into the cave"; "the Missouri feeds into the Mississippi"
TRACE MatchManager - n0 0 v#1147339 [course] hunt with hounds; "He often courses hares"
It contains noun as well as verb senses, while one might argue that at least verb senses are not relevant here. Furthermore, one might argue that the only right sense is n#886144. So, there are some mistakes during WSD step.
Looking further down in the debug output, one should see the results of the element-level matching:
TRACE WordNet - Found > using @,#m,#s,#p (HYPERNYM, MEMBER_, SUBSTANCE_, PART_HOLONYM) between n#6008975[science, scientific discipline] and n#6125083[earth science]
TRACE ElementMatcher - 3 > 0 sciences > earth sciences
The results of this step, while mostly correct (often those coming from WordNet) and occasionally wrong (often those coming from less accurate matchers, such as string matchers) are the axioms for the last matching step and might reveal the reasons for a strange mapping result.
Lets consider another example of a simulated API match:
water
measure
vs
waterBodyMeasures
secondaryMeasure
The matching result contains 4 links and among them one particularly suspicious one \water\measure = \waterBodyMeasures\secondaryMeasure.
Debug output of source context shows nothing of interest. However, debug output of target context
TRACE MatchManager - n0 waterBodyMeasures n0_0
TRACE MatchManager - n0 0 waterbodymeasures waterbodymeasures
TRACE MatchManager - n1 secondaryMeasure n1_0 & n1_1
TRACE MatchManager - n1 0 secondarymeasure secondarymeasure
TRACE MatchManager - n1 0 n#8446856 [secondary] the defensive football players who line up behind the linemen
TRACE MatchManager - n1 0 n#4171063 [secondary coil, secondary winding, secondary] coil such that current is induced in it by passing a current through the primary coil
TRACE MatchManager - n1 0 a#1859389 [secondary] being of second rank or importance or value; not direct or immediate; "the stone will be hauled to a secondary crusher"; "a secondary source"; "a secondary issue"; "secondary streams"
TRACE MatchManager - n1 0 a#2108248 [junior-grade, lower-ranking, lowly, petty, secondary, subaltern] inferior in rank or status; "the junior faculty"; "a lowly corporal"; "petty officialdom"; "a subordinate functionary"
TRACE MatchManager - n1 0 a#1863896 [secondary] depending on or incidental to what is original or primary; "a secondary infection"
TRACE MatchManager - n1 0 a#1476701 [secondary] not of major importance; "played a secondary role in world events"
TRACE MatchManager - n1 0 a#797147 [secondary] belonging to a lower class or rank
TRACE MatchManager - n1 1 measure measure
TRACE MatchManager - n1 1 n#175261 [measure, step] any maneuver made as part of progress toward a goal; "the situation called for strong measures"; "the police took steps to reduce crime"
TRACE MatchManager - n1 1 n#33914 [measure, quantity, amount] how much there is or how many there are of something that you can quantify
TRACE MatchManager - n1 1 n#6548844 [bill, measure] a statute in draft before it becomes law; "they held a public hearing on the bill"
TRACE MatchManager - n1 1 n#998911 [measurement, measuring, measure, mensuration] the act or process of assigning numbers to phenomena according to a rule; "the measurements were carefully done"; "his mental measurings proved remarkably accurate"
TRACE MatchManager - n1 1 n#7275291 [standard, criterion, measure, touchstone] a basis for comparison; a reference point against which other things can be evaluated; "the schools comply with federal standards"; "they set the measure for all subsequent work"
TRACE MatchManager - n1 1 n#7108759 [meter, metre, measure, beat, cadence] (prosody) the accent in a metrical foot of verse
TRACE MatchManager - n1 1 n#6877775 [measure, bar] musical notation for a repeating pattern of musical beats; "the orchestra omitted the last twelve bars of the song"
TRACE MatchManager - n1 1 n#3741128 [measuring stick, measure, measuring rod] measuring instrument having a sequence of marks at regular intervals; used as a reference in making measurements
TRACE MatchManager - n1 1 n#3739135 [measure] a container of some standard capacity that is used to obtain fixed amounts of a substance
TRACE MatchManager - n1 1 v#648747 [measure, mensurate, measure out] determine the measurements of something or somebody, take measurements of; "Measure the length of the wall"
TRACE MatchManager - n1 1 v#490773 [quantify, measure] express as a number or measure or quantity; "Can you quantify your results?"
TRACE MatchManager - n1 1 v#2710209 [measure] have certain dimensions; "This table surfaces measures 20inches by 36 inches"
TRACE MatchManager - n1 1 v#683348 [measure, evaluate, valuate, assess, appraise, value] evaluate or estimate the nature, quality, ability, extent, or significance of; "I will have the family jewels appraised by a professional"; "access all the factors when taking a risk"
shows that label waterBodyMeasures is parsed badly by default parser: n0_0. Nevertheless, matching continues and the element-level matching step output:
TRACE ElementMatcher - Prefix: water = waterbodymeasures
TRACE ElementMatcher - 0 = 0 water = waterbodymeasures
TRACE ElementMatcher - n0.[water].0.water = n0.[waterBodyMeasures].0.waterbodymeasures
TRACE ElementMatcher - Suffix: measure = secondarymeasure
TRACE ElementMatcher - 0 = 0 measure = secondarymeasure
TRACE ElementMatcher - n1.[measure].0.measure = n1.[secondaryMeasure].0.secondarymeasure
TRACE WordNet - Found = using & (SIMILAR_TO) between n#175261[measure, step] and n#175261[measure, step]
TRACE ElementMatcher - 0 = 1 measure = measure
TRACE ElementMatcher - n1.[measure].0.measure = n1.[secondaryMeasure].1.measure
reveals that there are axioms found, and some of them are wrong, such as the one found by the prefix and suffix string matchers.
If, taking the above into account, we fix the parsing by manually tokenizing the offending label, the results improve:
\water\measure > \water Body Measures\secondary Measure