Add HTML/Webpage Parsing Layer for RAG Pipeline

Right now, our RAG pipeline only handles PDFs, but a ton of valuable content lives on web pages and in raw HTML.  extend our parser to ingest HTML documents directly and pull out both text and visuals.