[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Manga metadata from mangaupdates.com

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 10
Thread images: 1

File: asthetics.jpg (140KB, 960x720px) Image search: [Google]
asthetics.jpg
140KB, 960x720px
Need database dump of comprehensive manga / webcomics database like mangaupdates.
Anybody scraped the site? or is there some other source?
>>
>>367066
Somebody was scraping it for a while, since they specifically disabled the ability to go past the 50th page when you search their database to discourage it. There was also a fake mangaupdates site, sheeky forums style, but I don't remember what its url was. That may have been a couple years ago.
>>
>>367084
What I feel is that the site works like a black box, and is too important for non-japanese manga fans. Something that important should not be a black box. We need to scrape all the data and do a db dump.
>>
>>367102
Maybe scrape archive.org's cache for every series? Filling every series id number from 1 to 144285 (current highest series) after:
https://www.mangaupdates.com/series.html?id=

Some have been pruned as either duplicates or oneshots absorbed into collections.
>>
>>367166
brute forcing the links till 150000 was in my plan but it cannot be done without something like http://luminati.io/ . But I don't want to make the effort if it has already been done before / alternates exist.
>>
I've been searching high and low for a dataset comprising mangaupdates' content of manga/info/tags... to no avail.

would really appreciate it if someone did it before and is willing to share.

also have a bump.
>>
>>367240
Found the thread about the fake mangaupdates site. It's dead now, so you won't be able to ask whoever was running that for what they had.
https://www.mangaupdates.com/showtopic.php?tid=52190
>>
>>367240
I'm willing to scrape the site if I don't find similar datasets as well. I'm could to write a new scraper in my free time. Will dump the data in pastebin or something. What would you use it for?
>>
>>367308
All kinds of things, but first of all is write an automatic renamer to standardize the file names of my (huge ass) collection and help people in the same shoes as I in the process (madokami for example).

I am planning to get into AI and machine learning, but I need something that motivates me, such a dataset is sure to help me put together a mini project or two about my manga consumption and my involvement with all kinds of people over the years.

Also generally that site is the backbone of scanlations today, it'd be disastrous for the community if something were to happen to it, think of nyaa for example, it left a huge mess to clean up. Why the people over at MU never release their dataset as a free resource is beyond me, but we need to preserve the damn thing for sure.
>>
>>367364
>Why the people over at MU never release their dataset as a free resource is beyond me
Because the site hasn't had any sort of programmer and thus has been ignoring reasonable suggestions for improvement for years. You think they want to give potential competitors a free head start? If the database goes elsewhere, they've got nothing but founder and network effects in their favor, and both of those are pretty weak given that 90% of the readers just go to online readers anyway. It'd be easy to advertise the shiny, new site on mangafox, batoto and the like.
Thread posts: 10
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.