txt.sour.is mckinley@twtxt.net "@prologic@twtxt.net What I need it to do is crawl a website, executing JavaScript along the way, and saving the resulting DOMs to HTML files. It ..."

twtxt.net

Wed, Oct 26 02:15 2022 (1y ago)

↳ In-reply-to » Anyone know of a tool that will crawl a website, run JavaScript, and then save the resulting DOM as HTML?

@prologic@twtxt.net What I need it to do is crawl a website, executing JavaScript along the way, and saving the resulting DOMs to HTML files. It isn’t necessary to save the files downloaded via XHR and the like, but I would need it to save page requisites. CSS, JavaScript, favicons, etc.

Something that I’d like to have, but isn’t required, is mirroring of content (+ page requisites) in frames. (Example) This would involve spanning hosts, but I only need to span hosts for this specific purpose.

It would also be nice if the program could resolve absolute paths to relative paths (/en-US/docs/Web/HTML/Global_attributes -> ../../Global_attributes) but this isn’t required either. I think I’m going to have to have a local Web server running anyway because just about all the links are to directories with an index.html. (i.e the actual file referenced by /en-US/docs/Web/HTML/Global_attributes is /en-US/docs/Web/HTML/Global_attributes/index.html.)

⤋ Read More