Manga metadata from mangaupdates.com

Thread replies: 10
Thread images: 1

Anonymous
Manga metadata from mangaupdates.com 2017-08-20 08:00:17 Post No. 367066
[Report] Image search: [Google]

File: asthetics.jpg (140KB, 960x720px) Image search: [Google]

Manga metadata from mangaupdates.com Anonymous 2017-08-20 08:00:17 Post No. 367066 [Report]

Need database dump of comprehensive manga / webcomics database like mangaupdates.
Anybody scraped the site? or is there some other source?

Anonymous 2017-08-20 09:17:00 Post No.367084
[Report]

Anonymous 2017-08-20 09:17:00 Post No.367084 [Report]

>>367066
Somebody was scraping it for a while, since they specifically disabled the ability to go past the 50th page when you search their database to discourage it. There was also a fake mangaupdates site, sheeky forums style, but I don't remember what its url was. That may have been a couple years ago.

Anonymous 2017-08-20 10:33:03 Post No.367102
[Report]

Anonymous 2017-08-20 10:33:03 Post No.367102 [Report]

>>367084
What I feel is that the site works like a black box, and is too important for non-japanese manga fans. Something that important should not be a black box. We need to scrape all the data and do a db dump.

Anonymous 2017-08-20 02:40:53 Post No.367166
[Report]

Anonymous 2017-08-20 02:40:53 Post No.367166 [Report]

>>367102
Maybe scrape archive.org's cache for every series? Filling every series id number from 1 to 144285 (current highest series) after:
https://www.mangaupdates.com/series.html?id=

Some have been pruned as either duplicates or oneshots absorbed into collections.

Anonymous 2017-08-20 02:59:23 Post No.367171
[Report]

Anonymous 2017-08-20 02:59:23 Post No.367171 [Report]

>>367166
brute forcing the links till 150000 was in my plan but it cannot be done without something like http://luminati.io/ . But I don't want to make the effort if it has already been done before / alternates exist.

Anonymous 2017-08-20 05:23:57 Post No.367240
[Report]

Anonymous 2017-08-20 05:23:57 Post No.367240 [Report]

I've been searching high and low for a dataset comprising mangaupdates' content of manga/info/tags... to no avail.

would really appreciate it if someone did it before and is willing to share.

also have a bump.

Anonymous 2017-08-20 05:33:14 Post No.367243
[Report]

Anonymous 2017-08-20 05:33:14 Post No.367243 [Report]

>>367240
Found the thread about the fake mangaupdates site. It's dead now, so you won't be able to ask whoever was running that for what they had.
https://www.mangaupdates.com/showtopic.php?tid=52190

Anonymous 2017-08-20 08:06:09 Post No.367308
[Report]

Anonymous 2017-08-20 08:06:09 Post No.367308 [Report]

>>367240
I'm willing to scrape the site if I don't find similar datasets as well. I'm could to write a new scraper in my free time. Will dump the data in pastebin or something. What would you use it for?

Anonymous 2017-08-20 09:58:18 Post No.367364
[Report]

Anonymous 2017-08-20 09:58:18 Post No.367364 [Report]

>>367308
All kinds of things, but first of all is write an automatic renamer to standardize the file names of my (huge ass) collection and help people in the same shoes as I in the process (madokami for example).

I am planning to get into AI and machine learning, but I need something that motivates me, such a dataset is sure to help me put together a mini project or two about my manga consumption and my involvement with all kinds of people over the years.

Also generally that site is the backbone of scanlations today, it'd be disastrous for the community if something were to happen to it, think of nyaa for example, it left a huge mess to clean up. Why the people over at MU never release their dataset as a free resource is beyond me, but we need to preserve the damn thing for sure.

Anonymous 2017-08-21 03:08:49 Post No.367505
[Report]

Anonymous 2017-08-21 03:08:49 Post No.367505 [Report]

>>367364
>Why the people over at MU never release their dataset as a free resource is beyond me
Because the site hasn't had any sort of programmer and thus has been ignoring reasonable suggestions for improvement for years. You think they want to give potential competitors a free head start? If the database goes elsewhere, they've got nothing but founder and network effects in their favor, and both of those are pretty weak given that 90% of the readers just go to online readers anyway. It'd be easy to advertise the shiny, new site on mangafox, batoto and the like.

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible. Read more on this topic here - https://archived.moe/talk/thread/1694/

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/