-
Notifications
You must be signed in to change notification settings - Fork 50
Save the errors in the state, not just the error counts. Makes it ea… #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Should we worry about memory in cases where a treebank has over 260k incidents? |
|
Good point, we also haven't considered this when saving errors to dump into json. I'm thinking about two possible strategies (but it's just a random idea, I have to give it a little more thought):
|
|
I like @ellepannitto's first idea (using the list as an array of length n, unbounded if the user passes, say, -1). Should we add this to the to-do list in #132 too? (For context: our rewrite addresses the exact same problem, although it does not use the state to do so. If I recall correctly, our solution is described in the text of our draft pull request). |
There is already the option called Now either this option could also regulate the number of errors saved and returned. Or there could be a similar option so that errors printed and errors returned are regulated separately. |
Sure, it'd be easy enough to redo it so the counts are still kept separately, but there's a field that keeps the errors up to --max_err and then stops |
…ier for a calling program to use the results Only keep track of --max_err errors, but still count all the errors
32e6649 to
9ad3a82
Compare
|
(updated the PR) |
Actually, after a bit more thinking, I would control the two requirements separately. In the current on-line validation, I print all errors to the log (i.e., |
|
Added a flag for that, as well |
Save the errors in the state, not just the error counts. Makes it easier for a calling program to use the results