@prologic@twtxt.net What I need it to do is crawl a website, executing JavaScript along the way, and saving the resulting DOMs to HTML files. It isn’t necessary to save the files downloaded via XHR and the like, but I would need it to save page requisites. CSS, JavaScript, favicons, etc.
Something that I’d like to have, but isn’t required, is mirroring of content (+ page requisites) in frames. (Example) This would involve spanning hosts, but I only need to span hosts for this specific purpose.
It would also be nice if the program could resolve absolute paths to relative paths (/en-US/docs/Web/HTML/Global_attributes
-> ../../Global_attributes
) but this isn’t required either. I think I’m going to have to have a local Web server running anyway because just about all the links are to directories with an index.html
. (i.e the actual file referenced by /en-US/docs/Web/HTML/Global_attributes
is /en-US/docs/Web/HTML/Global_attributes/index.html
.)