Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
395 commits
Select commit Hold shift + click to select a range
1c81e2f
refactor
VitalyRomanov Sep 17, 2021
b51565e
fix typo
VitalyRomanov Sep 20, 2021
ce9cf76
change output
VitalyRomanov Sep 20, 2021
00f26a9
refactor, use full neighborhood sampling
VitalyRomanov Oct 5, 2021
946c8e0
added additional filtration for typeann experiment, added bunch of ne…
VitalyRomanov Oct 5, 2021
6bf1bf9
added codebert tokenizer, added codebert support for python batcher, …
VitalyRomanov Oct 5, 2021
3f99f0d
added comments
VitalyRomanov Oct 5, 2021
7cb40ea
added flag to force w2v negative sampling
VitalyRomanov Oct 5, 2021
5bd4c73
made sure that mentions also count towards actual names
VitalyRomanov Oct 5, 2021
e7151d4
added script for processing arbitrary code with sourcetrail
VitalyRomanov Oct 5, 2021
3074345
added script to replace args with mentiuons in function_annotation d…
VitalyRomanov Oct 5, 2021
d13fe5a
updated default parameters
VitalyRomanov Oct 5, 2021
e0ecc15
hide irrelevant operation
VitalyRomanov Oct 5, 2021
b1f5859
make default embeddings zero by default
VitalyRomanov Oct 5, 2021
be0bb53
experimenting with transformer
VitalyRomanov Oct 5, 2021
d3fc4d9
print max f1 at the end of training
VitalyRomanov Oct 5, 2021
3cfbc61
fix bugs
VitalyRomanov Oct 5, 2021
7b2f633
fix AnnAssign parsing
VitalyRomanov Oct 5, 2021
2920007
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 5, 2021
8cb4493
remove global nodes from training node_name objective
VitalyRomanov Oct 6, 2021
bed54ed
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 6, 2021
3688c62
try fixing a problem
VitalyRomanov Oct 6, 2021
b042bcc
try fixing a problem
VitalyRomanov Oct 6, 2021
9d5dd0c
temp fix
VitalyRomanov Oct 6, 2021
0643b58
added flag to recompute local2global mappings
VitalyRomanov Oct 6, 2021
5452468
refactor
VitalyRomanov Oct 11, 2021
ef03814
made sure transr objective works
VitalyRomanov Oct 11, 2021
7a5d3be
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 11, 2021
d384ed2
fix loading procedure when using different computer
VitalyRomanov Oct 12, 2021
46fe4db
add initial connection residual
VitalyRomanov Oct 19, 2021
1e041cf
refactor, track ndcg and hits@k
VitalyRomanov Oct 20, 2021
3ce6dd1
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 20, 2021
879149a
create edge prediction with elementembedder
VitalyRomanov Oct 20, 2021
e4b49b1
changed margin for link predictor
VitalyRomanov Oct 20, 2021
110c1e5
triplet loss instead cosineembloss
VitalyRomanov Oct 21, 2021
378d326
use w2v when restoring to precompute target embeddings
VitalyRomanov Oct 22, 2021
262cb67
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 22, 2021
b7cda2c
use full neighbourhood sampler, use brute force scorer, testing updat…
VitalyRomanov Oct 24, 2021
94eb83e
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 24, 2021
60ca683
preparing evaluation scripts
VitalyRomanov Oct 28, 2021
5f42e53
switch to faiss because of speed
VitalyRomanov Oct 28, 2021
c71b6d0
forcing w2v in the beginning of training is no longer needed
VitalyRomanov Oct 28, 2021
7cfe069
removed irrelevant code
VitalyRomanov Oct 28, 2021
06b174d
fix data loading
VitalyRomanov Oct 28, 2021
30a1b8d
added comment about missing ast edges
VitalyRomanov Oct 28, 2021
6a0baf7
added function for debugging
VitalyRomanov Oct 28, 2021
8ca0163
prepare index during final evaluation? load model to cpu?
VitalyRomanov Oct 28, 2021
266430f
added option to remove default values from code
VitalyRomanov Oct 28, 2021
4e16f5c
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 28, 2021
8ce911b
disable bias for rggan layer
VitalyRomanov Oct 28, 2021
030acaf
go back to brute and enable bias
VitalyRomanov Oct 28, 2021
c8a99b2
disable w2v for first epoch, disable scoring for train set
VitalyRomanov Oct 28, 2021
03fa1ea
use GPU for computing neighbours when possible
VitalyRomanov Oct 28, 2021
51648da
testing faiss index, updating every 5 batches
VitalyRomanov Oct 29, 2021
a0b76ed
testing faiss index, updating every 1 batches
VitalyRomanov Oct 29, 2021
85d7140
fixed bug where positive examples were corrupt
VitalyRomanov Oct 29, 2021
d132ed5
make sure nothing fails when nn classifier used
VitalyRomanov Oct 30, 2021
ec1dbbf
sparseadam for embeddings
VitalyRomanov Oct 30, 2021
2a76907
moved back to the model with self loop
VitalyRomanov Oct 30, 2021
39859f1
standardized l2 link predictor
VitalyRomanov Oct 30, 2021
052c128
added argument to specify traiing metric
VitalyRomanov Oct 30, 2021
5d7f127
make sure cpu
VitalyRomanov Oct 30, 2021
8c3f90c
fix torch.norm usage
VitalyRomanov Oct 31, 2021
a5e3728
added option to set nearest neighbour backend
VitalyRomanov Oct 31, 2021
a98afb5
appears to be float for some reason
VitalyRomanov Oct 31, 2021
01ba8a1
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 31, 2021
32edf5f
switching back to triplet
VitalyRomanov Nov 2, 2021
c4bd536
disable precomputing for subword objectives
VitalyRomanov Nov 3, 2021
d6b3a3e
fix bug
VitalyRomanov Nov 3, 2021
be53e0f
added no clf objective
VitalyRomanov Nov 3, 2021
1f22574
added additional metrics
VitalyRomanov Nov 4, 2021
ea3d3f5
adding ns_groups
VitalyRomanov Nov 4, 2021
b0101d1
added scoring for transr objective
VitalyRomanov Nov 4, 2021
e9dc4ff
no evaluation for train
VitalyRomanov Nov 4, 2021
7001a7e
added flag to save each epoch
VitalyRomanov Nov 4, 2021
04caf19
added ns groups
VitalyRomanov Nov 7, 2021
f9bb4cf
disable train evaluation
VitalyRomanov Nov 7, 2021
bcbcd71
add holdout
VitalyRomanov Nov 8, 2021
abcaa15
added additional evaluation
VitalyRomanov Nov 8, 2021
b0138e4
testing holdout evaluation
VitalyRomanov Nov 10, 2021
37677cb
testing holdout evaluation
VitalyRomanov Nov 11, 2021
62169f9
Revert "testing holdout evaluation"
VitalyRomanov Nov 11, 2021
8646d99
Revert "testing holdout evaluation"
VitalyRomanov Nov 11, 2021
6cac62a
added no localization
VitalyRomanov Nov 11, 2021
a867c73
updated heatmap look
VitalyRomanov Nov 12, 2021
7b7911d
added codebert training
VitalyRomanov Nov 13, 2021
e0cabb6
added checkpoint loading
VitalyRomanov Nov 14, 2021
578f43b
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 14, 2021
a9a034d
skip batches with zero edges, do not fail on exception
VitalyRomanov Nov 15, 2021
0c2dca6
compute embeddings from external checkpoint
VitalyRomanov Nov 16, 2021
2f83a7b
compute embeddings from external checkpoint
VitalyRomanov Nov 16, 2021
2fd8123
compute embeddings from external checkpoint
VitalyRomanov Nov 16, 2021
1accd3c
enable graph emb
VitalyRomanov Nov 24, 2021
cb067b1
prevent cnn model from failing when win size is even
VitalyRomanov Nov 26, 2021
bb59cf3
unique temp folder for each run
VitalyRomanov Nov 26, 2021
f0ccdae
make sure emty batches are not generated
VitalyRomanov Nov 26, 2021
eefa032
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 26, 2021
46e4777
save no_loc param
VitalyRomanov Nov 27, 2021
155e18c
codebert to popular
VitalyRomanov Nov 27, 2021
3f47a7d
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 27, 2021
01efc8c
write timestamp when training type prediction
VitalyRomanov Nov 27, 2021
ae33519
update setup
VitalyRomanov Nov 27, 2021
c9a406d
store node strings
VitalyRomanov Nov 28, 2021
7291c9c
added comment for setup
VitalyRomanov Nov 28, 2021
941dc81
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 28, 2021
1d14eff
added script to split type prediction dataset, use prepared splits fo…
VitalyRomanov Nov 28, 2021
23da553
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 28, 2021
3cebbe7
added umap
VitalyRomanov Nov 29, 2021
b0e6eee
do type ann from specific ids
VitalyRomanov Nov 29, 2021
ced7280
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 29, 2021
18e6ebe
Updated readme and package versions. Added example code.
VitalyRomanov Feb 1, 2022
6e824fa
added example
VitalyRomanov Feb 3, 2022
c2f7020
use in-memory db
VitalyRomanov Feb 3, 2022
bf40635
use random ids instead of time-based, decipher some node names, fixed…
VitalyRomanov Feb 3, 2022
3251370
make identifier more unique
VitalyRomanov Feb 3, 2022
9338689
added docker to run sourcetrail indexer
VitalyRomanov Feb 3, 2022
fd921e1
updated README.md
VitalyRomanov Feb 3, 2022
5f39258
Update README.md
VitalyRomanov Feb 3, 2022
76ee7fd
updated README.md
VitalyRomanov Feb 4, 2022
54a6eb6
added documentation
VitalyRomanov Feb 4, 2022
22703f4
store name mappings as a file
VitalyRomanov Feb 4, 2022
197070f
added fn visualizer
VitalyRomanov Feb 4, 2022
d1ee0b0
minor improvements
VitalyRomanov Feb 4, 2022
b23ccc2
Merge remote-tracking branch 'origin/master'
VitalyRomanov Feb 4, 2022
4b7501c
rewriting ast parser
VitalyRomanov Feb 5, 2022
7ee2243
Update README.md
VitalyRomanov Feb 5, 2022
31065c2
added images
VitalyRomanov Feb 5, 2022
6eb3f5b
Merge remote-tracking branch 'origin/master'
VitalyRomanov Feb 5, 2022
012821b
Update README.md
VitalyRomanov Feb 5, 2022
1ecf99f
added definitions, refactor
VitalyRomanov Feb 5, 2022
adb6ff7
highlight additional edges with blue
VitalyRomanov Feb 5, 2022
0fccbec
Merge remote-tracking branch 'origin/master'
VitalyRomanov Feb 5, 2022
3c44d72
convert code to edges without global index
VitalyRomanov Feb 8, 2022
520dbbe
adding cli options
VitalyRomanov Feb 8, 2022
01b2ed2
added aligned data reader, added graph builder purely from ast, refactor
VitalyRomanov Feb 8, 2022
9ae58e7
change how imports are parsed
VitalyRomanov Feb 8, 2022
e83543b
change recover codebert tokens after alignment
VitalyRomanov Feb 8, 2022
c606f57
added converter for variable misuse detection dataset
VitalyRomanov Feb 12, 2022
38553a5
added missing node parsers
VitalyRomanov Feb 13, 2022
9e63903
make mention readable
VitalyRomanov Feb 13, 2022
a4634ef
fix issue where subword instances were not connected, fix issue with …
VitalyRomanov Feb 13, 2022
d551e3f
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Feb 13, 2022
c6ff353
inherit mentioned_in field for edges so that it is easy to filter fun…
VitalyRomanov Feb 13, 2022
b5bdf5d
added scripts to process cubert benchmark datasets
VitalyRomanov Feb 13, 2022
1ca5c79
added scripts to download py150
VitalyRomanov Feb 13, 2022
75b7954
remove printing
VitalyRomanov Feb 13, 2022
84d9eb6
rearrange
VitalyRomanov Feb 13, 2022
42f447b
added converter script
VitalyRomanov Feb 13, 2022
0284f5a
Merge remote-tracking branch 'origin/master'
VitalyRomanov Feb 13, 2022
672ff5d
make ast graph builder more memory efficient
VitalyRomanov Feb 13, 2022
a507713
fix column names
VitalyRomanov Feb 13, 2022
7f184a6
added example
VitalyRomanov Feb 15, 2022
0b142c4
small fixes
VitalyRomanov Feb 16, 2022
eb503d0
update files
VitalyRomanov Feb 16, 2022
075abf1
specify column types
VitalyRomanov Feb 16, 2022
3fde2aa
added json support, try to recognize compressed files
VitalyRomanov Feb 16, 2022
6c763ee
fixed extension detection, updated example
VitalyRomanov Feb 16, 2022
9afbd96
fix issue that occurs when node is not found in the current graph
VitalyRomanov Feb 16, 2022
49045e2
remove temporary folder after finishing
VitalyRomanov Feb 16, 2022
fe037d3
improve log parser
VitalyRomanov Feb 16, 2022
5a5c569
refactor
VitalyRomanov Feb 16, 2022
9e1ba8e
refactor, changed how local2global mapping works
VitalyRomanov Feb 16, 2022
ed02553
changed how local2global mapping works
VitalyRomanov Feb 16, 2022
a1baa23
detect json before pickle
VitalyRomanov Mar 2, 2022
92f5dac
pretty printing of output
VitalyRomanov Mar 2, 2022
4d62e9d
added comments for training options
VitalyRomanov Mar 2, 2022
cc87467
load json by default
VitalyRomanov Mar 2, 2022
bfa7bf8
added large graph example
VitalyRomanov Mar 2, 2022
8aed372
added support for configuration file
VitalyRomanov Mar 9, 2022
a8a82dd
added configuration file
VitalyRomanov Mar 9, 2022
2f7a7b3
added configuration file
VitalyRomanov Mar 9, 2022
d7f7a1a
added configuration file
VitalyRomanov Mar 9, 2022
4ae8404
refactor
VitalyRomanov Mar 10, 2022
7511dbe
Merge remote-tracking branch 'origin/master'
VitalyRomanov Mar 10, 2022
b6f6073
refactor
VitalyRomanov Mar 10, 2022
e7286ed
move to config file support
VitalyRomanov Mar 10, 2022
02dd948
unify loaders
VitalyRomanov Mar 14, 2022
1bb491f
added column to normalize for edges
VitalyRomanov Mar 14, 2022
1100393
added subgraph objective
VitalyRomanov Mar 14, 2022
79c64a0
ensure label consistency
VitalyRomanov Mar 14, 2022
c5e1dd6
added script for generating train test partitions
VitalyRomanov Mar 14, 2022
fbc26b1
subgraph example
VitalyRomanov Mar 14, 2022
d799b55
additional arguments for subsampling
VitalyRomanov Mar 15, 2022
3c525de
added options to provide subgraph partitions
VitalyRomanov Mar 15, 2022
d5ba7cc
added script to extract cubert partitions
VitalyRomanov Mar 15, 2022
ec7b2de
adding subgraph classification objective
VitalyRomanov Mar 15, 2022
267255b
working on subgraph classification
VitalyRomanov Mar 16, 2022
7682067
easing memory burden for reading json
VitalyRomanov Mar 16, 2022
5df860c
lazy import of pygraphviz
VitalyRomanov Mar 16, 2022
b7e3fe1
Subgraph classification example
VitalyRomanov Mar 16, 2022
450f40c
remove cubert_varmisuse_graph_tiny from repository
VitalyRomanov Mar 16, 2022
9d02a2c
reduce dataset creator memory consumption
VitalyRomanov Mar 16, 2022
9e09a61
added support for compressed files
VitalyRomanov Mar 16, 2022
89df05d
fix
VitalyRomanov Mar 18, 2022
7cc8d9c
added scripts
VitalyRomanov Mar 18, 2022
d434054
Merge remote-tracking branch 'origin/master'
VitalyRomanov Mar 18, 2022
3fb0399
fix return type
VitalyRomanov Mar 24, 2022
925f942
fix index when reading json with chunks
VitalyRomanov Mar 24, 2022
91c9bde
add bypass for node classification when k is larger than the number o…
VitalyRomanov Mar 25, 2022
500c003
added script
VitalyRomanov Mar 25, 2022
7cc0aaf
added script
VitalyRomanov Mar 25, 2022
ad2ce02
changed cnn params
VitalyRomanov Mar 25, 2022
ca864cf
refactor
VitalyRomanov Mar 25, 2022
e7f350f
added partitioning for scaa
VitalyRomanov Mar 25, 2022
d2d5baf
added preparation script for scaa
VitalyRomanov Mar 25, 2022
a2fb2cc
added script for extracting node level labels for variable misuse
VitalyRomanov Mar 25, 2022
737c88c
handle parallel edges
VitalyRomanov Apr 11, 2022
2d154cf
fix type annotation edges
VitalyRomanov Apr 11, 2022
a953b01
partial merge with dev branch
VitalyRomanov Apr 11, 2022
046db39
partial merge with dev branch
VitalyRomanov Apr 11, 2022
e5dc92c
updatint experiment interfaces
VitalyRomanov Apr 11, 2022
d0bca76
feat: partitioning fixed
MefistAldemisov Apr 15, 2022
b09b557
fix issue where predicted labels were exported for tb
VitalyRomanov Apr 18, 2022
618419f
update tensorflow version to avoid bugs
VitalyRomanov Apr 20, 2022
f3de461
fix a bug where `connect_subwords` parameter was not handled properly
VitalyRomanov Apr 20, 2022
d3c3b13
changed how figures are drawn
VitalyRomanov Apr 20, 2022
08a2de2
make more reasonable figures
VitalyRomanov Apr 26, 2022
f358795
feat: new pooling funcion
MefistAldemisov Apr 26, 2022
092de1e
chenged figure format
VitalyRomanov Apr 27, 2022
0dd539a
make easier to run
VitalyRomanov Apr 27, 2022
fd97b15
never not finetune graph embeddings
VitalyRomanov Apr 27, 2022
711f97a
Merge remote-tracking branch 'origin/master'
VitalyRomanov Apr 27, 2022
cf99ed4
removed
VitalyRomanov Apr 27, 2022
1fc2bdb
changes
VitalyRomanov Apr 27, 2022
0b781ba
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Apr 27, 2022
fcb1770
added test data
VitalyRomanov Apr 27, 2022
8cb86c4
added test data
VitalyRomanov Apr 27, 2022
f400c6b
refactor
VitalyRomanov Apr 27, 2022
db26a62
fix: pooling dimensionality
MefistAldemisov Apr 29, 2022
879f76d
training with new pooling
MefistAldemisov May 1, 2022
34152c5
added seed for ast dataset creator
VitalyRomanov May 6, 2022
66f77d8
chnaged format from jsonl to json
VitalyRomanov May 6, 2022
00e4628
fixed bug that prevented var misuse dataset from being created correctly
VitalyRomanov May 6, 2022
0af0140
added script for preparing varmisuse dataset
VitalyRomanov May 6, 2022
5c03268
added script for preparing varmisuse dataset, make file formats easie…
VitalyRomanov May 6, 2022
bae24c6
refactor
VitalyRomanov May 6, 2022
cc73418
added fasttext extractor
VitalyRomanov May 6, 2022
2f48fad
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov May 6, 2022
cfeeb6f
added support for graph matching
VitalyRomanov May 7, 2022
5e7dac9
embedding training
MefistAldemisov May 15, 2022
f5faf64
feat: partitioning fixed
MefistAldemisov Apr 15, 2022
71746ac
feat: new pooling funcion
MefistAldemisov Apr 26, 2022
7a38e71
fix: pooling dimensionality
MefistAldemisov Apr 29, 2022
f2cc93d
training with new pooling
MefistAldemisov May 1, 2022
0c7bcc2
embedding training
MefistAldemisov May 15, 2022
69fb392
feat: multihead attention
MefistAldemisov May 17, 2022
d11f746
feat: small dataset training
MefistAldemisov May 25, 2022
b5b2d5e
merging
MefistAldemisov May 25, 2022
32fdcff
training with anaysis
MefistAldemisov Jun 5, 2022
a09aa28
final analysis
MefistAldemisov Jun 22, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@ __pycache__
.DS_Store
*.pkl
*.tsv
*.json
*.json.bz2
*.csv
90*1000*.png
examples/scaa
examples/task
examples/new_scaa
examples/one_to_one
examples/one_vs_10
lda_docstring_embeddings*
.idea
SourceCodeTools.egg-info
Expand Down
Loading