You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the parser in MyHTML_OPTIONS_PARSE_MODE_SINGLE mode, it is initialized in myhtml_init like this:
case MyHTML_OPTIONS_PARSE_MODE_SINGLE:
if((status = myhtml_create_stream_and_batch(myhtml, 0, 0)))
return status;
As this call specify that is need 0 stream, the myhtml->thread_stream is initialized to NULL.
myhtml->thread_stream = NULL;
But then, when parsing CDATA (in myhtml_tokenizer_state_markup_declaration_open()), the parser try to call myhtml_tree_wait_for_last_done_token(), which try to access unconditionally tree->myhtml->thread_stream->timespec and obviously it crashes (thread_stream is NULL).
Backtrace:
myhtml_tree_wait_for_last_done_token(tree=., token_for_wait=.) at tree.c:2457
myhtml_tokenizer_state_markup_declaration_open(tree=., token_node=., html="…", html_offset=413, html_size=378555) at tokenizer.c:943
myhtml_tokenizer_chunk_process(tree=., html="…", html_length=378555) at tokenizer.c:88
myhtml_tokenizer_chunk(tree=., html="…", html_length=378555) at tokenizer.c:104
myhtml_tokenizer_begin(tree=., html="…", html_length=378555) at tokenizer.c:42
myhtml_parse_fragment(tree=., encoding=MyENCODING_DEFAULT, html="…") at main.c
The text was updated successfully, but these errors were encountered:
Hi @Jean-Daniel
In a single mode, tokens will always be equal and the program will not enter the loop.
Do you have an example html where the program in a single mode enter to this loop?
I saw and corrected another problem. Please, try code from master.
Sorry, I didn't gave you enough info. I'm actually using the parser to extract some data from html fragments (I only have the content), and I don't really need a full tree.
So I'm using the 'after token done' callback, and disable the tree by using MyHTML_TREE_PARSE_FLAGS_WITHOUT_BUILD_TREE.
A quick test reveal that this is the later flag that trigger the bug. Without it, the parser works flawlessly, but when I set this flag, it crashes on CDATA.
#import <myhtml/api.h>
intmain(intargc, char**argv) {
constchar*bytes="<div><![CDATA[ foo ]]></div>";
size_tlength=strlen(bytes);
myhtml_t*myhtml=myhtml_create();
myhtml_init(myhtml, MyHTML_OPTIONS_PARSE_MODE_SINGLE, 1, 0);
myhtml_tree_t*tree=myhtml_tree_create();
myhtml_tree_init(tree, myhtml);
myhtml_tree_parse_flags_set(tree, MyHTML_TREE_PARSE_FLAGS_WITHOUT_BUILD_TREE | MyHTML_TREE_PARSE_FLAGS_SKIP_WHITESPACE_TOKEN);
// parse html (we only have the body)myhtml_parse_fragment(tree, MyENCODING_UTF_8, bytes, length, MyHTML_TAG_BODY, MyHTML_NAMESPACE_HTML);
myhtml_tree_destroy(tree);
myhtml_destroy(myhtml);
return0;
}
When using the parser in MyHTML_OPTIONS_PARSE_MODE_SINGLE mode, it is initialized in myhtml_init like this:
As this call specify that is need 0 stream, the
myhtml->thread_stream
is initialized to NULL.But then, when parsing CDATA (in
myhtml_tokenizer_state_markup_declaration_open()
), the parser try to callmyhtml_tree_wait_for_last_done_token()
, which try to access unconditionallytree->myhtml->thread_stream->timespec
and obviously it crashes (thread_stream
is NULL).Backtrace:
The text was updated successfully, but these errors were encountered: