Skip to content
Aliaksandr Autayeu edited this page Jan 17, 2015 · 1 revision

Debug output

S-Match is not perfect. Sometimes it returns mappings which are incorrect and sometimes it does not return mappings one would expect. If you have a mapping which you deem incorrect, you can always debug the source code, but in some cases there might be a shorter path - by enabling and studying debug output. The following might help you understand where the problem comes from.

Steps

Most frequently mistakes originate in preprocessing or element-level matching. To debug your mapping, we suggest to

  • isolate the mapping
  • enable debug output
  • run default matching in synchronous mode
  • study debug output

First example

Lets follow the steps to debug one mapping out of default cw example.

Isolating the mapping

If you match context with several nodes, the debug output can be overwhelming. We suggest debugging one mapping at a time. Leave one node on the left side, one node on the right side and run the matching. For example, suppose you isolate a node in the source context cd.txt

Courses
	College of Arts and Sciences
		Earth and Atmospheric Sciences

and a node in the target context wd.txt

Course
	College of Arts and Sciences
		Earth Sciences
			Geophysics

Enabling debug output

To enable debug output for S-Match one needs to configure Log4J. Out of many possible ways to do it, one can create a conf/log4j-debug.properties file, which enables TRACE logging level for all S-Match components:

log4j.rootLogger=INFO, C

# Configure logging for component packages
log4j.logger.net.sf.extjwnl=INFO, C
log4j.additivity.net.sf.extjwnl=false

log4j.logger.it.unitn.disi.smatch=TRACE, C
log4j.additivity.it.unitn.disi.smatch=false

# C – root console appender
log4j.appender.C=org.apache.log4j.ConsoleAppender
log4j.appender.C.layout=org.apache.log4j.PatternLayout
log4j.appender.C.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %t %p %c{1} - %m%n

and specify this configuration in the command line script bin/cw-debug.bat:

call match-manager.bat convert ..\test-data\cw\cd.txt ..\test-data\cw\cd.xml -config=..\conf\s-match-Tab2XML.xml
call match-manager.bat convert ..\test-data\cw\wd.txt ..\test-data\cw\wd.xml -config=..\conf\s-match-Tab2XML.xml

SET JAVA_OPTS=-Dlog4j.configuration=..\conf\log4j-debug.properties
call match-manager.bat offline ..\test-data\cw\cd.xml ..\test-data\cw\cd.xml -config=..\conf\s-match-synchronous.xml
call match-manager.bat offline ..\test-data\cw\wd.xml ..\test-data\cw\wd.xml -config=..\conf\s-match-synchronous.xml

call match-manager.bat online ..\test-data\cw\cd.xml ..\test-data\cw\wd.xml ..\test-data\cw\result-cdwd.txt -config=..\conf\s-match-synchronous.xml

Line SET JAVA_OPTS=-Dlog4j.configuration=..\conf\log4j-debug.properties sets the location of the Log4J configuration for the command line utility.

Running default matching in synchronous mode

By default S-Match runs in asynchronous mode, using multithreading. This leads to mixing of output lines in the log file. While this can be handled by separating output of different threads, it is easier to use single-threaded configuration. In shown above cw-debug.bat match manager is called with a -config=..\conf\s-match-synchronous.xml configuration, where S-Match works in single-threaded mode.

Studying debug output

The usual suspects are preprocessing and element-level matching. The most important out of results of preprocessing are concept at label formulas. The debug output might look something like this:

INFO MatchManager - Preprocessing...
INFO MatchManager - Computing concepts at label...
TRACE DefaultContextPreprocessor - preprocessing: Courses
TRACE DefaultContextPreprocessor - preprocessing: College of Arts and Sciences
TRACE DefaultContextPreprocessor - preprocessing: Earth and Atmospheric Sciences
...
TRACE MatchManager - 		n1	College of Arts and Sciences	n1_0 & n1_1 | n1_3
TRACE MatchManager - 		n1	0	college	college
TRACE MatchManager - 		n1	0	n#8295090	[college]	the body of faculty and students of a college
TRACE MatchManager - 		n1	0	n#8295245	[college]	an institution of higher education created to educate and grant degrees; often a part of a university
TRACE MatchManager - 		n1	0	n#3073756	[college]	a complex of buildings in which an institution of higher education is housed
TRACE MatchManager - 		n1	1	arts	arts
TRACE MatchManager - 		n1	1	n#6163352	[humanistic discipline, humanities, liberal arts, arts]	studies intended to provide general knowledge and intellectual skills (rather than occupational or professional skills); "the college of arts and sciences"
TRACE MatchManager - 		n1	3	science	sciences
TRACE MatchManager - 		n1	3	n#6008975	[science, scientific discipline]	a particular branch of scientific knowledge; "the science of genetics"
...

For example, in this case there is a mistake in logical formula:

College of Arts and Sciences	n1_0 & n1_1 | n1_3

while the correct result should be

College of Arts and Sciences	n1_0 & (n1_1 | n1_3)

Another common mistake at this step is a mistake in Word Sense Disambiguation (WSD). Lets consider Courses node

TRACE MatchManager - 	n0	Courses	n0_0
TRACE MatchManager - 	n0	0	course	courses
TRACE MatchManager - 	n0	0	n#886144	[course, course of study, course of instruction, class]	education imparted in a series of lessons or meetings; "he took a course in basket weaving"; "flirting is not unknown in college classes"
TRACE MatchManager - 	n0	0	n#8393816	[course, line]	a connected series of events or actions or developments; "the government took a firm course"; "historians can only point out those lines for which evidence is available"
TRACE MatchManager - 	n0	0	n#8698960	[course, trend]	general line of orientation; "the river takes a southern course"; "the northeastern trend of the coast"
TRACE MatchManager - 	n0	0	n#39000	[course, course of action]	a mode of action; "if you persist in that course you will surely fail"; "once a nation is embarked on a course of action it becomes extremely difficult for any retraction to take place"
TRACE MatchManager - 	n0	0	n#9410115	[path, track, course]	a line or route along which something travels or moves; "the hurricane demolished houses in its path"; "the track of an animal"; "the course of the river"
TRACE MatchManager - 	n0	0	n#8255384	[class, form, grade, course]	a body of students who are taught together; "early morning classes are always sleepy"
TRACE MatchManager - 	n0	0	n#7572535	[course]	part of a meal served at one time; "she prepared a three course meal"
TRACE MatchManager - 	n0	0	n#3124680	[course, row]	(construction) a layer of masonry; "a course of bricks"
TRACE MatchManager - 	n0	0	n#3124441	[course]	facility consisting of a circumscribed area of land or water laid out for a sport; "the course had only nine holes"; "the course was less than a mile"
TRACE MatchManager - 	n0	0	v#2071468	[course]	move swiftly through or over; "ships coursing the Atlantic"
TRACE MatchManager - 	n0	0	v#2070867	[run, flow, feed, course]	move along, of liquids; "Water flowed into the cave"; "the Missouri feeds into the Mississippi"
TRACE MatchManager - 	n0	0	v#1147339	[course]	hunt with hounds; "He often courses hares"

It contains noun as well as verb senses, while one might argue that at least verb senses are not relevant here. Furthermore, one might argue that the only right sense is n#886144. So, there are some mistakes during WSD step.

Looking further down in the debug output, one should see the results of the element-level matching:

TRACE WordNet - Found > using @,#m,#s,#p (HYPERNYM, MEMBER_, SUBSTANCE_, PART_HOLONYM) between n#6008975[science, scientific discipline] and n#6125083[earth science]
TRACE ElementMatcher - 3	>	0		sciences	>	earth sciences

The results of this step, while mostly correct (often those coming from WordNet) and occasionally wrong (often those coming from less accurate matchers, such as string matchers) are the axioms for the last matching step and might reveal the reasons for a strange mapping result.

Second example

Lets consider another example of a simulated API match:

water
	measure

vs

waterBodyMeasures
	secondaryMeasure

The matching result contains 4 links and among them one particularly suspicious one \water\measure = \waterBodyMeasures\secondaryMeasure.

Debug output of source context shows nothing of interest. However, debug output of target context

TRACE MatchManager - 	n0	waterBodyMeasures	n0_0
TRACE MatchManager - 	n0	0	waterbodymeasures	waterbodymeasures
TRACE MatchManager - 		n1	secondaryMeasure	n1_0 & n1_1
TRACE MatchManager - 		n1	0	secondarymeasure	secondarymeasure
TRACE MatchManager - 		n1	0	n#8446856	[secondary]	the defensive football players who line up behind the linemen
TRACE MatchManager - 		n1	0	n#4171063	[secondary coil, secondary winding, secondary]	coil such that current is induced in it by passing a current through the primary coil
TRACE MatchManager - 		n1	0	a#1859389	[secondary]	being of second rank or importance or value; not direct or immediate; "the stone will be hauled to a secondary crusher"; "a secondary source"; "a secondary issue"; "secondary streams"
TRACE MatchManager - 		n1	0	a#2108248	[junior-grade, lower-ranking, lowly, petty, secondary, subaltern]	inferior in rank or status; "the junior faculty"; "a lowly corporal"; "petty officialdom"; "a subordinate functionary"
TRACE MatchManager - 		n1	0	a#1863896	[secondary]	depending on or incidental to what is original or primary; "a secondary infection"
TRACE MatchManager - 		n1	0	a#1476701	[secondary]	not of major importance; "played a secondary role in world events"
TRACE MatchManager - 		n1	0	a#797147	[secondary]	belonging to a lower class or rank
TRACE MatchManager - 		n1	1	measure	measure
TRACE MatchManager - 		n1	1	n#175261	[measure, step]	any maneuver made as part of progress toward a goal; "the situation called for strong measures"; "the police took steps to reduce crime"
TRACE MatchManager - 		n1	1	n#33914	[measure, quantity, amount]	how much there is or how many there are of something that you can quantify
TRACE MatchManager - 		n1	1	n#6548844	[bill, measure]	a statute in draft before it becomes law; "they held a public hearing on the bill"
TRACE MatchManager - 		n1	1	n#998911	[measurement, measuring, measure, mensuration]	the act or process of assigning numbers to phenomena according to a rule; "the measurements were carefully done"; "his mental measurings proved remarkably accurate"
TRACE MatchManager - 		n1	1	n#7275291	[standard, criterion, measure, touchstone]	a basis for comparison; a reference point against which other things can be evaluated; "the schools comply with federal standards"; "they set the measure for all subsequent work"
TRACE MatchManager - 		n1	1	n#7108759	[meter, metre, measure, beat, cadence]	(prosody) the accent in a metrical foot of verse
TRACE MatchManager - 		n1	1	n#6877775	[measure, bar]	musical notation for a repeating pattern of musical beats; "the orchestra omitted the last twelve bars of the song"
TRACE MatchManager - 		n1	1	n#3741128	[measuring stick, measure, measuring rod]	measuring instrument having a sequence of marks at regular intervals; used as a reference in making measurements
TRACE MatchManager - 		n1	1	n#3739135	[measure]	a container of some standard capacity that is used to obtain fixed amounts of a substance
TRACE MatchManager - 		n1	1	v#648747	[measure, mensurate, measure out]	determine the measurements of something or somebody, take measurements of; "Measure the length of the wall"
TRACE MatchManager - 		n1	1	v#490773	[quantify, measure]	express as a number or measure or quantity; "Can you quantify your results?"
TRACE MatchManager - 		n1	1	v#2710209	[measure]	have certain dimensions; "This table surfaces measures 20inches by 36 inches"
TRACE MatchManager - 		n1	1	v#683348	[measure, evaluate, valuate, assess, appraise, value]	evaluate or estimate the nature, quality, ability, extent, or significance of; "I will have the family jewels appraised by a professional"; "access all the factors when taking a risk"

shows that label waterBodyMeasures is parsed badly by default parser: n0_0. Nevertheless, matching continues and the element-level matching step output:

TRACE ElementMatcher - Prefix:	water	=	waterbodymeasures
TRACE ElementMatcher - 0	=	0		water	=	waterbodymeasures
TRACE ElementMatcher - n0.[water].0.water	=	n0.[waterBodyMeasures].0.waterbodymeasures
TRACE ElementMatcher - Suffix:	measure	=	secondarymeasure
TRACE ElementMatcher - 0	=	0		measure	=	secondarymeasure
TRACE ElementMatcher - n1.[measure].0.measure	=	n1.[secondaryMeasure].0.secondarymeasure
TRACE WordNet - Found = using & (SIMILAR_TO) between n#175261[measure, step] and n#175261[measure, step]
TRACE ElementMatcher - 0	=	1		measure	=	measure
TRACE ElementMatcher - n1.[measure].0.measure	=	n1.[secondaryMeasure].1.measure

reveals that there are axioms found, and some of them are wrong, such as the one found by the prefix and suffix string matchers.

If, taking the above into account, we fix the parsing by manually tokenizing the offending label, the results improve:

\water\measure	>	\water Body Measures\secondary Measure

Clone this wiki locally