Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
257 commits
Select commit Hold shift + click to select a range
55f96c8
increased complexity of bilinear layer
VitalyRomanov Jun 20, 2021
97b50bf
fixed a bug when inverse index was not returned
VitalyRomanov Jun 21, 2021
0158413
enforced the same dimensionality for LSTMDecoder
VitalyRomanov Jun 21, 2021
66e7e44
added length mask to the generator
VitalyRomanov Jun 21, 2021
e9839eb
added option for name generation
VitalyRomanov Jun 21, 2021
278729d
working on test for rggan
VitalyRomanov Jun 21, 2021
d28ec72
preparing test for rggan
VitalyRomanov Jun 22, 2021
d4abb59
added description
VitalyRomanov Jun 23, 2021
e449741
prevent division by zero
VitalyRomanov Jun 23, 2021
68c0611
added id based node embedder
VitalyRomanov Jun 23, 2021
91a2b30
added node classification objective
VitalyRomanov Jun 23, 2021
48478a5
refactor
VitalyRomanov Jun 23, 2021
a818fb0
prepared test for rggan
VitalyRomanov Jun 23, 2021
3186bed
added options for early stopping
VitalyRomanov Jun 23, 2021
21caf66
test multiple datasets at once
VitalyRomanov Jun 23, 2021
831c0fd
added documentation, fixed missing nodes in offsets, merge offsets, c…
VitalyRomanov Jul 22, 2021
ba62419
updating script for creating type prediction dataset
VitalyRomanov Jul 22, 2021
56620d5
added documentation
VitalyRomanov Jul 22, 2021
651a797
added TODO
VitalyRomanov Jul 22, 2021
d15ff3b
added parameter verification
VitalyRomanov Jul 22, 2021
236d41c
added support for running ReplacementNodeResolver without initialization
VitalyRomanov Jul 22, 2021
71c56fe
added script for graph complexity analysis
VitalyRomanov Jul 22, 2021
c7b46f3
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Jul 22, 2021
ea60cbb
temp fix for issue in linux
VitalyRomanov Jul 22, 2021
04cc428
modified script for preparing type annotation dataset
VitalyRomanov Jul 22, 2021
39238b9
refactor
VitalyRomanov Jul 23, 2021
7fd8762
add option to remove all type edges from the graph
VitalyRomanov Jul 23, 2021
4d618f1
updated script for training dglke embeddings
VitalyRomanov Jul 23, 2021
dd23abd
added a hot fix to detect only simple declarations
VitalyRomanov Jul 25, 2021
832a65c
made graph embeddigns optional
VitalyRomanov Jul 25, 2021
34a31c3
replaced an exception with a warning
VitalyRomanov Jul 25, 2021
bb05413
added documentation
VitalyRomanov Jul 26, 2021
25faf2e
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Jul 26, 2021
c43fba9
added dglke converter to Embedder
VitalyRomanov Jul 26, 2021
69af4ac
fix typos
VitalyRomanov Jul 26, 2021
8b5615d
fix situation when graph embeddigns are not provided
VitalyRomanov Jul 26, 2021
a043645
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Jul 26, 2021
9d1cda3
remove shelve in destructor
VitalyRomanov Jul 27, 2021
7886cd5
do not use tmp for shelve
VitalyRomanov Jul 27, 2021
2c9c3a8
use attention decoder, added positional codes to Torch Decoder
VitalyRomanov Jul 27, 2021
9895ca7
dglke training script
VitalyRomanov Jul 28, 2021
d6fd631
dglke training script
VitalyRomanov Jul 28, 2021
1eff56b
dglke training script
VitalyRomanov Jul 30, 2021
3114177
make sure all type edges are filtered
VitalyRomanov Aug 4, 2021
978ccc0
Merge remote-tracking branch 'origin/master'
VitalyRomanov Aug 4, 2021
21b848a
remove quotes when normalizing types
VitalyRomanov Aug 5, 2021
05383fe
fix issue where model is restored with incorrect parameters
VitalyRomanov Aug 5, 2021
29acad6
fix plots
VitalyRomanov Aug 5, 2021
46d8282
use self attention in the decoder when no target is available
VitalyRomanov Aug 5, 2021
ba912cd
temporarily disable passing targets and calling seq_gen
VitalyRomanov Aug 5, 2021
a9ae0af
added tf flags
VitalyRomanov Aug 5, 2021
e15ac7f
sort by length, disable cache
VitalyRomanov Aug 5, 2021
5ff3697
removed redundant reshapes
VitalyRomanov Aug 5, 2021
0ac3581
dwitch to flat decoder
VitalyRomanov Aug 5, 2021
85a9e76
change model folder format
VitalyRomanov Aug 5, 2021
1c55b40
use tmp
VitalyRomanov Aug 5, 2021
c816e3b
manual tmp
VitalyRomanov Aug 6, 2021
0dc93bd
preparing for tf.function
VitalyRomanov Aug 6, 2021
4e7cd99
fix epoch duration formatting
VitalyRomanov Aug 6, 2021
923e5af
save batch size
VitalyRomanov Aug 10, 2021
4c378c7
move f1 calculations to separate function
VitalyRomanov Aug 10, 2021
17fc560
evaluation mode by default
VitalyRomanov Aug 10, 2021
821a5d2
draw confusion matrix
VitalyRomanov Aug 10, 2021
24f9fa1
added option for sorting data by length
VitalyRomanov Aug 10, 2021
6f9df75
added projection matrix for node classifier
VitalyRomanov Aug 11, 2021
b85599f
pass embedding file explicitly
VitalyRomanov Aug 11, 2021
38c5943
port typeann to newer api
VitalyRomanov Aug 11, 2021
409c0c7
Merge remote-tracking branch 'origin/master'
VitalyRomanov Aug 11, 2021
50d8e6e
fix typo
VitalyRomanov Aug 11, 2021
8d8cbb3
add random_seed, pass mask during inference
VitalyRomanov Aug 11, 2021
74cab61
rm tmp dir if it exists, need this when seen is fixed
VitalyRomanov Aug 11, 2021
0f241d2
save only the best checkpoint
VitalyRomanov Aug 11, 2021
0e89fd9
save only the best checkpoint
VitalyRomanov Aug 11, 2021
9da4327
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Aug 11, 2021
69b6910
read batch size from parameters
VitalyRomanov Aug 11, 2021
568416c
fix bug when unknown tags are mapped out of range
VitalyRomanov Aug 11, 2021
c589004
proper calculation of test scores, fix checkpoint saving
VitalyRomanov Aug 12, 2021
e0309af
fix imports
VitalyRomanov Aug 17, 2021
1859817
doing rggan test
VitalyRomanov Aug 17, 2021
0919472
evaluate on test set
VitalyRomanov Aug 17, 2021
f612643
fix issue with type annotation offsets
VitalyRomanov Aug 17, 2021
36f3041
testing rggan
VitalyRomanov Aug 17, 2021
ba264ec
evaluating on test split
VitalyRomanov Aug 17, 2021
a31b84a
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Aug 17, 2021
0ed7f07
added GATConv for basis decomposition
VitalyRomanov Aug 21, 2021
78d91e1
hmm
VitalyRomanov Aug 21, 2021
6005ae5
disabme gru and attentiveaggregator and use parameter sharing
VitalyRomanov Aug 21, 2021
5be242e
preparing edge prediction objective
VitalyRomanov Aug 21, 2021
f244a14
preparing generation dataloader
VitalyRomanov Aug 21, 2021
6f38345
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Aug 21, 2021
54285d3
testing model with bias
VitalyRomanov Aug 23, 2021
a16a1a4
edge preduction does not classify
VitalyRomanov Aug 24, 2021
3a59df6
edge preduction does not classify
VitalyRomanov Aug 24, 2021
5b4eae5
store number of batches
VitalyRomanov Sep 9, 2021
2cc3a72
extensive filtering for edge prediction
VitalyRomanov Sep 9, 2021
0dc33db
override negative sampling strategy for the first epoch, use estimate…
VitalyRomanov Sep 9, 2021
16e553a
random folder for cache
VitalyRomanov Sep 10, 2021
249973a
added dummy tensor
VitalyRomanov Sep 10, 2021
2c3e14e
extend sourcetrail types
VitalyRomanov Sep 10, 2021
803802a
handle tensors of size 1
VitalyRomanov Sep 10, 2021
d760c76
added type prediction objective
VitalyRomanov Sep 10, 2021
798554c
working on k-hop graph generation
VitalyRomanov Sep 10, 2021
ed54d90
added custom reverse option
VitalyRomanov Sep 11, 2021
d9756f9
use classifier to train type annotations
VitalyRomanov Sep 11, 2021
c620122
disable merging of ast and global nodes
VitalyRomanov Sep 11, 2021
212278d
skip constant nodes in the body (docstrings), do not create mention_s…
VitalyRomanov Sep 11, 2021
18f327a
fix typo
VitalyRomanov Sep 11, 2021
6f376c0
do not remove global_mention when removing global edges
VitalyRomanov Sep 11, 2021
0e20675
added script to extract nodes of interest from type annotation datase…
VitalyRomanov Sep 11, 2021
d7108f6
remove parallel edges in the afterprocessing
VitalyRomanov Sep 11, 2021
abcc6d2
added post_pruning
VitalyRomanov Sep 12, 2021
95ca613
added post_pruning
VitalyRomanov Sep 12, 2021
dbd3764
extended pruning
VitalyRomanov Sep 12, 2021
d5ff657
mark global nodes as embeddable
VitalyRomanov Sep 12, 2021
987c121
added option to restrict node_ids to a specified set
VitalyRomanov Sep 13, 2021
902e16b
restricetd pool includes mentions and function definitions
VitalyRomanov Sep 14, 2021
a5b1f24
use faiss index by default
VitalyRomanov Sep 14, 2021
31dc768
do not negative sampling index for the first epoch
VitalyRomanov Sep 14, 2021
c73b81b
remove reverse for edge prediction
VitalyRomanov Sep 15, 2021
abf95c6
added filtration of populkar types in experiments
VitalyRomanov Sep 15, 2021
448087c
Merge remote-tracking branch 'origin/master'
VitalyRomanov Sep 15, 2021
3daf963
added progress bar for evaluation
VitalyRomanov Sep 16, 2021
5e0a2e6
fixed the filtration procedure for edge prediction objective
VitalyRomanov Sep 16, 2021
039d054
fixed an issue that appears when no sources for a package could be pa…
VitalyRomanov Sep 16, 2021
e2ca984
changed default node type for type_ann experiment
VitalyRomanov Sep 16, 2021
06b0dde
changed default parameters for rggan
VitalyRomanov Sep 16, 2021
f873a49
added comment
VitalyRomanov Sep 16, 2021
0a8ed60
plot train test curves
VitalyRomanov Sep 16, 2021
cdca63d
added info print at the beginning of epoch
VitalyRomanov Sep 16, 2021
76c4e4d
fixed an issue that appears when no sources for a package could be pa…
VitalyRomanov Sep 16, 2021
b876379
refactor
VitalyRomanov Sep 16, 2021
92e077c
embed all nodes
VitalyRomanov Sep 16, 2021
a01067c
fix num_batches count
VitalyRomanov Sep 16, 2021
303f578
increased the size of inference batch, do not overwrite checkpoints f…
VitalyRomanov Sep 16, 2021
3d7ee33
added progress bar for inference
VitalyRomanov Sep 16, 2021
49eecac
collect average train metrics
VitalyRomanov Sep 16, 2021
26f2ea9
refactor
VitalyRomanov Sep 16, 2021
9386fcc
save best model
VitalyRomanov Sep 16, 2021
1c81e2f
refactor
VitalyRomanov Sep 17, 2021
b51565e
fix typo
VitalyRomanov Sep 20, 2021
ce9cf76
change output
VitalyRomanov Sep 20, 2021
00f26a9
refactor, use full neighborhood sampling
VitalyRomanov Oct 5, 2021
946c8e0
added additional filtration for typeann experiment, added bunch of ne…
VitalyRomanov Oct 5, 2021
6bf1bf9
added codebert tokenizer, added codebert support for python batcher, …
VitalyRomanov Oct 5, 2021
3f99f0d
added comments
VitalyRomanov Oct 5, 2021
7cb40ea
added flag to force w2v negative sampling
VitalyRomanov Oct 5, 2021
5bd4c73
made sure that mentions also count towards actual names
VitalyRomanov Oct 5, 2021
e7151d4
added script for processing arbitrary code with sourcetrail
VitalyRomanov Oct 5, 2021
3074345
added script to replace args with mentiuons in function_annotation d…
VitalyRomanov Oct 5, 2021
d13fe5a
updated default parameters
VitalyRomanov Oct 5, 2021
e0ecc15
hide irrelevant operation
VitalyRomanov Oct 5, 2021
b1f5859
make default embeddings zero by default
VitalyRomanov Oct 5, 2021
be0bb53
experimenting with transformer
VitalyRomanov Oct 5, 2021
d3fc4d9
print max f1 at the end of training
VitalyRomanov Oct 5, 2021
3cfbc61
fix bugs
VitalyRomanov Oct 5, 2021
7b2f633
fix AnnAssign parsing
VitalyRomanov Oct 5, 2021
2920007
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 5, 2021
8cb4493
remove global nodes from training node_name objective
VitalyRomanov Oct 6, 2021
bed54ed
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 6, 2021
3688c62
try fixing a problem
VitalyRomanov Oct 6, 2021
b042bcc
try fixing a problem
VitalyRomanov Oct 6, 2021
9d5dd0c
temp fix
VitalyRomanov Oct 6, 2021
0643b58
added flag to recompute local2global mappings
VitalyRomanov Oct 6, 2021
5452468
refactor
VitalyRomanov Oct 11, 2021
ef03814
made sure transr objective works
VitalyRomanov Oct 11, 2021
7a5d3be
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 11, 2021
d384ed2
fix loading procedure when using different computer
VitalyRomanov Oct 12, 2021
46fe4db
add initial connection residual
VitalyRomanov Oct 19, 2021
1e041cf
refactor, track ndcg and hits@k
VitalyRomanov Oct 20, 2021
3ce6dd1
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 20, 2021
879149a
create edge prediction with elementembedder
VitalyRomanov Oct 20, 2021
e4b49b1
changed margin for link predictor
VitalyRomanov Oct 20, 2021
110c1e5
triplet loss instead cosineembloss
VitalyRomanov Oct 21, 2021
378d326
use w2v when restoring to precompute target embeddings
VitalyRomanov Oct 22, 2021
262cb67
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 22, 2021
b7cda2c
use full neighbourhood sampler, use brute force scorer, testing updat…
VitalyRomanov Oct 24, 2021
94eb83e
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 24, 2021
60ca683
preparing evaluation scripts
VitalyRomanov Oct 28, 2021
5f42e53
switch to faiss because of speed
VitalyRomanov Oct 28, 2021
c71b6d0
forcing w2v in the beginning of training is no longer needed
VitalyRomanov Oct 28, 2021
7cfe069
removed irrelevant code
VitalyRomanov Oct 28, 2021
06b174d
fix data loading
VitalyRomanov Oct 28, 2021
30a1b8d
added comment about missing ast edges
VitalyRomanov Oct 28, 2021
6a0baf7
added function for debugging
VitalyRomanov Oct 28, 2021
8ca0163
prepare index during final evaluation? load model to cpu?
VitalyRomanov Oct 28, 2021
266430f
added option to remove default values from code
VitalyRomanov Oct 28, 2021
4e16f5c
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 28, 2021
8ce911b
disable bias for rggan layer
VitalyRomanov Oct 28, 2021
030acaf
go back to brute and enable bias
VitalyRomanov Oct 28, 2021
c8a99b2
disable w2v for first epoch, disable scoring for train set
VitalyRomanov Oct 28, 2021
03fa1ea
use GPU for computing neighbours when possible
VitalyRomanov Oct 28, 2021
51648da
testing faiss index, updating every 5 batches
VitalyRomanov Oct 29, 2021
a0b76ed
testing faiss index, updating every 1 batches
VitalyRomanov Oct 29, 2021
85d7140
fixed bug where positive examples were corrupt
VitalyRomanov Oct 29, 2021
d132ed5
make sure nothing fails when nn classifier used
VitalyRomanov Oct 30, 2021
ec1dbbf
sparseadam for embeddings
VitalyRomanov Oct 30, 2021
2a76907
moved back to the model with self loop
VitalyRomanov Oct 30, 2021
39859f1
standardized l2 link predictor
VitalyRomanov Oct 30, 2021
052c128
added argument to specify traiing metric
VitalyRomanov Oct 30, 2021
5d7f127
make sure cpu
VitalyRomanov Oct 30, 2021
8c3f90c
fix torch.norm usage
VitalyRomanov Oct 31, 2021
a5e3728
added option to set nearest neighbour backend
VitalyRomanov Oct 31, 2021
a98afb5
appears to be float for some reason
VitalyRomanov Oct 31, 2021
01ba8a1
Merge remote-tracking branch 'origin/master'
VitalyRomanov Oct 31, 2021
32edf5f
switching back to triplet
VitalyRomanov Nov 2, 2021
c4bd536
disable precomputing for subword objectives
VitalyRomanov Nov 3, 2021
d6b3a3e
fix bug
VitalyRomanov Nov 3, 2021
be53e0f
added no clf objective
VitalyRomanov Nov 3, 2021
1f22574
added additional metrics
VitalyRomanov Nov 4, 2021
ea3d3f5
adding ns_groups
VitalyRomanov Nov 4, 2021
b0101d1
added scoring for transr objective
VitalyRomanov Nov 4, 2021
e9dc4ff
no evaluation for train
VitalyRomanov Nov 4, 2021
7001a7e
added flag to save each epoch
VitalyRomanov Nov 4, 2021
04caf19
added ns groups
VitalyRomanov Nov 7, 2021
f9bb4cf
disable train evaluation
VitalyRomanov Nov 7, 2021
bcbcd71
add holdout
VitalyRomanov Nov 8, 2021
abcaa15
added additional evaluation
VitalyRomanov Nov 8, 2021
b0138e4
testing holdout evaluation
VitalyRomanov Nov 10, 2021
37677cb
testing holdout evaluation
VitalyRomanov Nov 11, 2021
62169f9
Revert "testing holdout evaluation"
VitalyRomanov Nov 11, 2021
8646d99
Revert "testing holdout evaluation"
VitalyRomanov Nov 11, 2021
6cac62a
added no localization
VitalyRomanov Nov 11, 2021
a867c73
updated heatmap look
VitalyRomanov Nov 12, 2021
7b7911d
added codebert training
VitalyRomanov Nov 13, 2021
e0cabb6
added checkpoint loading
VitalyRomanov Nov 14, 2021
578f43b
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 14, 2021
a9a034d
skip batches with zero edges, do not fail on exception
VitalyRomanov Nov 15, 2021
0c2dca6
compute embeddings from external checkpoint
VitalyRomanov Nov 16, 2021
2f83a7b
compute embeddings from external checkpoint
VitalyRomanov Nov 16, 2021
2fd8123
compute embeddings from external checkpoint
VitalyRomanov Nov 16, 2021
1accd3c
enable graph emb
VitalyRomanov Nov 24, 2021
cb067b1
prevent cnn model from failing when win size is even
VitalyRomanov Nov 26, 2021
bb59cf3
unique temp folder for each run
VitalyRomanov Nov 26, 2021
f0ccdae
make sure emty batches are not generated
VitalyRomanov Nov 26, 2021
eefa032
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 26, 2021
46e4777
save no_loc param
VitalyRomanov Nov 27, 2021
155e18c
codebert to popular
VitalyRomanov Nov 27, 2021
3f47a7d
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 27, 2021
01efc8c
write timestamp when training type prediction
VitalyRomanov Nov 27, 2021
ae33519
update setup
VitalyRomanov Nov 27, 2021
c9a406d
store node strings
VitalyRomanov Nov 28, 2021
7291c9c
added comment for setup
VitalyRomanov Nov 28, 2021
941dc81
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 28, 2021
1d14eff
added script to split type prediction dataset, use prepared splits fo…
VitalyRomanov Nov 28, 2021
23da553
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 28, 2021
3cebbe7
added umap
VitalyRomanov Nov 29, 2021
b0e6eee
do type ann from specific ids
VitalyRomanov Nov 29, 2021
ced7280
Merge branch 'master' of https://github.com/VitalyRomanov/method-embe…
VitalyRomanov Nov 29, 2021
18e6ebe
Updated readme and package versions. Added example code.
VitalyRomanov Feb 1, 2022
d239334
Bump numpy from 1.18.1 to 1.21.0 in /scripts/data_collection
dependabot[bot] Feb 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 25 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,32 @@ Library for analyzing source code with graphs and NLP. What this repository can

### Installation

You need to use conda, create virtual environment `SourceCodeTools` with python 3.8
```bash
conda create -n SourceCodeTools python=3.8
```

If you plan to use graphviz
```python
conda install -c conda-forge pygraphviz
```

Install CUDA 11.1 if needed
```python
conda install -c nvidia cudatoolkit=11.1
```

To install SourceCodeTools library run
```bash
git clone https://github.com/VitalyRomanov/method-embedding.git
cd method-embedding
pip install -e .
```
# pip install -e .[gpu]
```

### Installing Sourcetrail
Download a release from [Github repo](https://github.com/CoatiSoftware/Sourcetrail/releases) (latest tested version is 2020.1.117). Add Sourcetrail location to `PATH`
```bash
echo 'export PATH=/path/to/Sourcetrail_2020_1_117:$PATH' >> ~/.bashrc
```
Scripts that use Sourcetrail work on Linux, some issues were spotted on Macs.
13 changes: 13 additions & 0 deletions SourceCodeTools/code/ast_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,18 @@ def get_mentions(function, root, mention):


def get_descendants(function, children):
"""

:param function: function string
:param children: List of targets.
:return: Offsets for attributes or names that are used as target for assignment operation. Subscript, Tuple and List
targets are skipped.
"""
descendants = []

# if isinstance(children, ast.Tuple):
# descendants.extend(get_descendants(function, children.elts))
# else:
for chld in children:
# for node in ast.walk(chld):
node = chld
Expand All @@ -42,6 +51,10 @@ def get_descendants(function, children):
[(node.lineno-1, node.end_lineno-1, node.col_offset, node.end_col_offset, "new_var")], as_bytes=True)
# descendants.append((node.id, offset[-1]))
descendants.append((function[offset[-1][0]:offset[-1][1]], offset[-1]))
# elif isinstance(node, ast.Tuple):
# descendants.extend(get_descendants(function, node.elts))
elif isinstance(node, ast.Subscript) or isinstance(node, ast.Tuple) or isinstance(node, ast.List):
pass # skip for now
else:
raise Exception("")

Expand Down
Loading