/co/mrade here. I'm trying to scrape the archives of a

Thread replies: 7
Thread images: 1

Anonymous
2016-09-19 09:50:01 Post No.
[Report] Image search: [Google]

File: 360full-errant-story-vol-2-(errant-story-series,-2)-cover.jpg (228KB, 360x531px) Image search: [Google]

Anonymous 2016-09-19 09:50:01 Post No. [Report]

/co/mrade here.

I'm trying to scrape the archives of a webcomic from 10 years ago so I can put it all in a .cbr file and not have to go through the website on a browser.

I've got a pretty good handle on how wget works from Googling, but the trouble I'm having I think has to do with how the website is set up.

When you View Image a comic from the archives, it shows up as
http://www.errantstory.com/comics/2003-01-31.jpg
You can change the filename to a different comic and it will go to that comic right away, but if you go to
http://www.errantstory.com/comics
it just redirects you to the main page.

The command I'm using is
wget -r -nd -P /home/myname/Downloads/ES -A "jpg" http://www.errantstory.com/comics/
, but it must be being redirected, because it only downloads the one comic that's on the main page.

Is there a way to get wget to go put in any possible filenames in that directory so it can download them? Or any way to stop it from getting redirected?

Anonymous 2016-09-19 10:28:20 Post No.56677095
[Report]

Anonymous 2016-09-19 10:28:20 Post No.56677095 [Report]

save this file http://pastebin.com/raw/v07TYA9J

then run:

wget -i <filename>

you'll get a bunch of 404s but at the end of it all, you'll have every comic jpg.

Anonymous 2016-09-19 10:34:22 Post No.56677130
[Report]

Anonymous 2016-09-19 10:34:22 Post No.56677130 [Report]

>>56676839
-e robots=off
and
-U 'Mozilla/5.0'
is always a good idea in case a website doesn't like robots

Anonymous 2016-09-19 10:35:50 Post No.56677146
[Report]

Anonymous 2016-09-19 10:35:50 Post No.56677146 [Report]

>>56677095
That's exactly what I was just trying to do!

I got as far as getting the list of dates exported to a text file. Can I ask what you used to add the URL and .jpg to all of them?

Anonymous 2016-09-19 10:57:30 Post No.56677296
[Report]

Anonymous 2016-09-19 10:57:30 Post No.56677296 [Report]

>>56677146

the script/loop i wrote added them.
otherwise you could do regex replace in sublime text

^ means start of line so just replace ^ with the url
$ means end of line so replace $ with .jpg

here's my hack job script:
for x in {2002..2012}; 
    do for y in $(seq -f "%02g" 1 12); 
        do for z in $(seq -f "%02g" 1 31); 
            do echo "http://www.errantstory.com/comics/$x-$y-$z.jpg"; 
            done; 
        done; 
    done;

Anonymous 2016-09-19 11:01:41 Post No.56677323
[Report]

Anonymous 2016-09-19 11:01:41 Post No.56677323 [Report]

>>56677296
Cool. Thanks.

Anonymous 2016-09-19 11:05:02 Post No.56677342
[Report]

Anonymous 2016-09-19 11:05:02 Post No.56677342 [Report]

>>56677323
no probs, all the best

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible. Read more on this topic here - https://archived.moe/talk/thread/1694/

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/