[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

How do I create my own personal archive of a website? I want

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 3
Thread images: 1

File: 1444931649077.jpg (48KB, 500x500px) Image search: [Google]
1444931649077.jpg
48KB, 500x500px
How do I create my own personal archive of a website?

I want to save all of the information on a particular website. I suppose I could just copy-paste the text into a txt file, but that sounds tedious.
There must be a better way than just navigating through all of the pages of the website and using ctrl-S.
>>
Scrapbook addon, works pretty well. You just have to think about what you are doing.
For example, if your website has content like:
site.domain/1
site.domain/2
site.domain/3
site.domain/1000

then you can use something like python to generate a list of all the versions from 1 to 1000, paste them into scrapbook, and it will save all of them.
That will not get branches with other names though.
You can of course manually save each page, but that will not be feasable on large sites.
There is also the option to follow links, up to a depth of 5 (1 means every link on the first specific page you save, like site.domain/3 will be followed, and also saved. depth 2 means every link on those linked sites will also be followed and saved.) this can get really tricky though if it links to anything outside, as well as produce duplicates.
>>
I think the usual name for this is "siterip", maybe you could use that to search for some tools.

You can use command line utils like curl and wget to do it, but it's tricky getting the details correct.
Thread posts: 3
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.