-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
The model is currently poor at decoding the position of edits (versus the type and character, e.g. edit = insert 'c' at 5 sig type * char * pos). This may be because:
- There is a bug in the python model.
- Absolute position encoding is bad, and should use "more advanced decoding" like rotational encoding instead.
- There is a bug in ocaml batch generation.
- Training isn't long enough, or the model is too small.
- Programs have inherent invariances, which lead to ambiguity and training noise: order in addition doesn't matter, e.g.
To which I think:
- Definite possibility, need a positive control dataset?
- Also probably true, but want to punt on this
- Unlikely, as inspected via plot_mmap.py
- Also unlikely, the model has in the past memorized the datasets. See 'positive control' above.
- Very likely culprit, suggest http://arxiv.org/abs/1802.03685 as an demo of how to deal with intrinsic invariances. (also pertinent: https://arxiv.org/abs/1711.08028)
Super curious to others thoughts on this. My instinct is to turn the AST (or any graph) into a list of addresses, then use a transformer to encode this into positions to be fed to a larger, orthogonal transformer.
Basically: programs are graphs (or at minimum trees), so operating on them as lists is dumb, and i think we're already running into these limits.
Imagine that this has been described in the literature, but I'm not aware of anything?