-
Notifications
You must be signed in to change notification settings - Fork 180
ASN1.decode/traverse/decode_all and ASN1#to_der in pure ruby #777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I think the tests outside
What breaking change do you expect? |
09c3fbf
to
e92f837
Compare
@rhenium made a few adjustments to the ruby code. I believe it got cleaner. The same point stands regarding I broke down Didn't start cutting C code just yet. |
e92f837
to
c4823d9
Compare
I think so, unfortunately.
Encoding the value should be trivial. I added in an inline comment.
Please check X.690 for exact details, but encoding these types should be relatively simple too, since they are just encoded into ISO 8601 (except UTCTime uses 2-digit years). |
95e2d8d
to
005d32b
Compare
2f32854
to
1e35012
Compare
@rhenium did UTCTime and GeneralizedTime, lmk what you think. |
13e200e
to
5785230
Compare
lib/openssl/asn1.rb
Outdated
xclass = take_asn1_tag_class(tag_class) | ||
|
||
i = constructed ? V_ASN1_CONSTRUCTED : 0 | ||
i |= (xclass & V_ASN1_PRIVATE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& V_ASN1_PRIVATE
doesn't change the value, as a tag class can only be one of the 4 classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported this from here
47e6cf5
to
cfcbe9e
Compare
@rhenium anything still pending? |
faa1dc0
to
6f22cbd
Compare
@rhenium it's been a while since I worked on this, but got a buffer to do some benchmark / optimizations while doing similar work in # benchmarks/asn1.rb
require "benchmark/ips"
require "openssl"
OBJS = {
"integer" => OpenSSL::ASN1::Integer.new(128),
"negative integer" => OpenSSL::ASN1::Integer.new(-128),
"oid" => OpenSSL::ASN1::ObjectId.new("1.2.34.56789".b),
"nid" => OpenSSL::ASN1::ObjectId.new("sha256"),
"emptyseq" => OpenSSL::ASN1::Sequence.new([]),
"seq" => OpenSSL::ASN1::Sequence.new([OpenSSL::ASN1::EndOfContent.new]),
"emptyset" => OpenSSL::ASN1::Set.new([]),
"set" => OpenSSL::ASN1::Set.new([OpenSSL::ASN1::EndOfContent.new]),
"utctime" => OpenSSL::ASN1::UTCTime.new(Time.utc(2016, 9, 8, 23, 43, 39)),
"gentime" => OpenSSL::ASN1::GeneralizedTime.new(Time.utc(2016, 12, 8, 19, 34, 29))
}
RubyVM::YJIT.enable
def test_name(name)
"#{name} (#{OpenSSL::VERSION})"
end
Benchmark.ips do |x|
# x.config(warmup: 2, time: 5)
name = ENV.fetch("TEST", "integer")
obj = OBJS.fetch(name)
x.report(test_name(name)) { obj.to_der }
x.save! "asn1-results.json"
x.compare!
end The measurements I got locally are below. The tl;dr:
|
lib/openssl/asn1.rb
Outdated
|
||
return "\x00".b if @value.empty? | ||
|
||
@unused_bits.chr << super |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer#chr
produces a US-ASCII String if the value is lower than 128.
I think we should avoid using Integer#chr
in OpenSSL::ASN1
at all since automatic encoding promotion with ascii-only strings can be very tricky, for example:
(1.chr << "a".b).encoding #=> #<Encoding:US-ASCII>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kept .chr
followed by .force_encoding
. I don't know of a more efficient way to perform the op in ruby (using pack
would involve allocating an array). If you have a better alternative, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's worth it. Array#pack
or String#setbyte
seems much easier to understand IMO, even if that introduces a slight performance loss.
If this level of optimization is necessary, I think that might mean we should stick with a C implementation, but I'm hoping it's not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you past example snippets of how you'd see this rewritten with pack
and/or. setbyte
? I'm struggling with seeing how this would end up more maintainable in this case, as setbyte
does not afaik do appends.
Sorry for the slow response. I've added some additional inline comments. My motivation for this series is that we can stop relying on those OpenSSL C API for BER encoding/decoding, which are screaming that it's not intended to be used outside OpenSSL, are completely undocumented, often have inconsistent semantics, and have undergone a lot of unexpected breaking changes. We also want to avoid performing string manipulation in C in general, especially when handling untrusted input. I think our top priority should be to make the code more understandable and maintainable, while keeping backwards compatibility. Good performance is nice to have, but I feel it's too early to focus on that. Some of the current optimizations feel a bit too aggressive. Using As for the speed of the OID encoding (or of anything else), I wouldn't rule out adding a specialized helper method implemented in C if it's necessary. However, I'm not yet sure if the current performance is an actual issue. I think it's too early to tell it. For measuring performance, I think using a complex and realistic structure like an X.509 certificate might provide a more useful result. |
6f22cbd
to
a46b07b
Compare
Wholeheartedly agree. FWIW the main motivation for this work is the belif that ruby code is more maintainable/readable than C code, and the ASN1 benefits little of being in C.
I agree, I just want to make sure that any regression is within a certain margin. There are people using this in production, after all. I don't think that some of the optimizations done here make the code less readable though:
Agree with the edge cases, but just to clarify, the goal of this MR is not to touch parsing (that can come later).
Agreed. It'd be great to have some input from someone that knows about YJIT or ZJIT whether these operations are something that future versions will be able to optimize though. The less C, the more maintainable it is.
Anything in the repo that I could use? |
36b490d
to
1a84e6e
Compare
this allows (when value is an integer) the runtime and JIT to optimize the call, instead of relying on C-extension openssl
…ppends) the former is a data structure that the runtime can improve processing of, and avoids realloc'ing the string on each call (as well as packing one byte at a time)
which means that it'll be nil; this improves encoding of asn1 nulls and cons with no elements
it's US-ASCII for integers < 128
* utctime should not take dates into account with a year below 1950 and above 2049 * gentime year must be 4 digits
c88a258
to
b3ce3b6
Compare
b3ce3b6
to
cba1a5f
Compare
@rhenium I've also added a pure ruby implementation of the decoding functions. LMK how does that look. The truffleruby jobs failed, but that seems to be rather a truffleruby bug. |
no_id_idx = if id == 0x1f | ||
id = 0 | ||
count = 1 | ||
data[1..].each_byte do |byte| | ||
count += 1 | ||
|
||
id = (id << 7) | (byte & 0x7f) | ||
break if byte.nobits?(0x80) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take extra care with the decoding code since the input data is not trusted. Some parts of this PR appear to have quadratic time complexity.
|
||
value = if is_indefinite_length | ||
unless is_constructed | ||
raise ASN1Error, "invalid length" if PRIMITIVE_TAG_IDS.include?(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is PRIMITIVE_TAG_IDS
and why is the exception raised for these tag numbers only? Isn't the indefinite length form always invalid for the primitive encoding?
Also, there is duplicate code within this method.
arity = block.arity | ||
if arity == 1 | ||
block.call(elems) | ||
else | ||
if arity < elems.size | ||
elems = elems[0, arity] | ||
end | ||
|
||
block.call(*elems) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be just yield elems
.
raise ASN1Error, "too long" | ||
end | ||
EndOfContent.new | ||
when 1 # BOOLEAN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use constants like OpenSSL::ASN1::BOOLEAN
instead. I believe their definitions can also be moved to Ruby since they're not used by the rest of openssl.so
.
tag_class = TAG_CLASS_TYPES.key(first_byte & 0xc0) || :UNIVERSAL | ||
is_constructed = first_byte.anybits?(0x20) | ||
is_indefinite_length = length == 0x80 # indefinite length | ||
id = first_byte & 0x1f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less important, but I'd suggest calling it a "number" instead of "id" since that's the term used by the specifications.
I didn't review deeper because this looks like a WIP. There are a number of unreachable code paths and missing input validations. At the moment, this feels like a fairly direct porting from C to Ruby. Could we try to restructure it so that it becomes more idiomatic Ruby? Our goal of a rewrite should be maintainability, not just doing something in Ruby. The existing C code does have issues, as it's built on top of the undocumented OpenSSL functions which were probably never meant to be used in the way they currently are, and also the general code quality is less than ideal. That's why I think a rewrite makes sense, and doing so in Ruby instead of C would be a good idea. But doing a 1:1 porting of problematic code to Ruby wouldn't really be an improvement IMO. |
This is a draft request-for-comment proposal for implementing
OpenSSL::ASN1::ASN1Data#to_der
in ruby, as per suggestion on the previous MR. It already passes the test, although there is a part which hasn't been ported yet, and seems not be covered in the tests. I guess that the ruby code can benefit from being broke down in multiple files, but I'll keep it here just for easy of review, ,for now. I also didn't delete C code, not yet.Currently, the whole of
#to_der
is implemented in the ASN1Data class. The reason is, one can build a primitive or constructive object "manually" (if tag and/or tag_class is known), or can just use arrays directly. It would benefit from moving specific implementations into the respective child classes, but then it'd be a breaking change because of what I said. Any suggestions?I also didn't do any benchmarks yet. Is it relevant?
LMK what you think.