XSystem is a method to learn and represent syntactic patterns in datasets as data structures called XStructures. Once XSystem learns a collection of patterns, it can be used to perform several tasks like: automatic label assignment, where data items are assigned a class by comparing them to a library of known classes (written as Regexes or XStructures); finding syntactically similar content, where learned XStructures are compared to see if they are similar, and outlier detection, where a learned XStructure for a single item is compared to other XStructures to check that its structure is different.
See the associated research paper here.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Java
- (Optional) Gradle.
Enter the following command in your console to clone the repository.
$> git clone https://github.com/UCHI-DB/xsystem-ng.gitSince XSystem is a gradle project and you have already included a gradle wrapper, you do not need to install gradle to build and test the project. Instead, you can just:
$> ./gradlew build -x testAnd now to run the unit tests, run the following command!
$> ./gradlew testThe XSystem Inteface has been implemented by XSystemImplementation class. To use the methods, first create an instance of the XSystemImplementation class.
XSystemImplementation impl = new XSystemImplementation();The methods available are as follows -
buildmethod
Given lines(an ArrayList of String), the build method learns the line into a XStructure, and returns the XStructure.
XStructure xstruct = impl.build(lines); generatemethod
Given a XStructure pattern, and an integer n, the generate method returns an ArrayList of n random Strings generated from the pattern.
ArrayList<String> randomString = impl.generate(pattern, n); similaritymethod
Given two XStructures, x1 and x2 or given a XStructure pattern and a String str, this method returns the similarity score between the two.
double similarity = impl.similarity(x1, x2); OR
double similarity = impl.similarity(pattern, str); matchmethod
This method returns a boolean indicating whether XStructure pattern and regular expression Pattern regex match with each other or not.
boolean match = impl.match(pattern, regex); computeOutlierScoremethod
Computes the outlier score for a given string str, in the given XStructure pattern
double outlierScore = impl.computeOutlierScore(pattern, str); mergetwoXStructsmethod
Merges two XStructures x1 and x2.
XStructure merged = impl.mergetwoXStructs(x1, x2); mergeMultipleXStructsmethod
Merges a list (ArrayList) of XStructures xstructList
XStructure merged = impl.mergetwoXStructs(xstructList); learnXStructsmethod
Given an input CSV file/folder with path - inputPath, this method learns the XStructs for each column and stores in a specified JSON file path, outFile.
impl.learnXStructs(inputPath, outFile); readXStructswthTypemethod
Given a JSON file/folder path inputJSONfolder, this method returns the list of pair XStructures and their corresponding datatype contained in it.
ArrayList<Pair<XStructure, String>> list = impl.readXStructswthType(inputJSONfolder); labelAssignwthRegexmethod
Given a regex-type CSV file/folder path sampleRegexFolderPath, assigns labels to a JSON file/folder of learned XStructs (path learnedXStructsJSONfolderpath) and outputs a CSV file whose path can be specified by the user - outFile
impl.labelAssignwthRegex(sampleRegexFolderPath, learnedXStructsJSONfolderpath, outFile); labelAssignmentwthXStructmethod
Given a ArrayList of String inputs - lines, returns the computed label from a given JSON file/folder (path referenceFilePath) of learned XStructs
String label = impl.labelAssignmentwthXStruct(lines, referenceFilePath); - Ipsita Mohanty
- Raul Castro Fernandez
This XSystem implementation is based on this paper.