[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Hi /a/, I am working on a recurring neural net project. aka

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 48
Thread images: 5

File: rnn.jpg (67KB, 1329x416px) Image search: [Google]
rnn.jpg
67KB, 1329x416px
Hi /a/,

I am working on a recurring neural net project. aka, advanced machine learning. If you're curious what this is, see here:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/

I would like to train my RNN to produce a script (in English) for an anime. I need your help. I need a lot of English sub data. A lot.

Does anyone know where I can get lots of English sub data for japanese cartoons (just the sub data). I can't really waste that much time downloading shows and pulling the sub data out of them by hand.

Also, what genre should I pick to train my RNN on? I'm thinking a harem anime script generator would be beautiful to behold, but I'm not sure if it would be as nearly as entertaining as a shounen one.
>>
You might find more help on /wsr/
>>
Sounds like a lot of work since you'll have to transcribe all the Japanese since most shows don't come with Japanese subs.
>>
>>154631572
It would be way too painful to make an RNN for Japanese and then translate it into English...
>>
Shounen is not a genre, dumbass.
>>
>>154631664
But it is
>>
>>154631760
Demographic. Educate yourself.
>>
>>154631468
Kitsunekko. Next time use >>>/wsr/
>>
harem! harem!

post it in /diy/ when youre done
>>
Is the anime about cell phones?
>>
>>154631468
Add more hidden layers!
>>
Get the .ass file out of soft-subbed anime from nyaa with aegisub or something.
>>
>>154631468
http://kitsunekko.net/dirlist.php?dir=subtitles%2F
>>
>>154631468
So you're making a bot that generates random subs for an anime that doesn't exist? And it learns how to make subs by studying a shit ton of subs from other shows?

Why are you doing this? What's the point if there's no anime attached, are you just trying to write scripts which are random yet coherent, in order to try and sell them to anime studios or something?
>>
>>154631569
>>154631882
You fucking nerds
>>
>>154631468
it's not magic. you'll gets scripts about as quality as "i come on cat she hiss at penis". may be good for laughs though

t. /g/
>>
>>154631468
Hope you love meme filled scripts that aren't actual translations.
>>
ITT: OP delivers a out-of-the-usual, authentic subject, meanwhile /a/ is cancerous as usual.
>>
>>154631468
I think shounens, being more formulaic, would be more likely to actually produce something coherent
>>
File: 1277013900069.jpg (212KB, 728x518px) Image search: [Google]
1277013900069.jpg
212KB, 728x518px
>>154631468
If I gave you all of the .ass files for the first 4 seasons of Gintama (about 200 episodes), would you generate a new one for me?
>>
>>154631468

Have you considered using the audio instead of the text? You may get more interesting results. It may also be easier to come by so long as you have the bandwidth/hard drive space.

Failing that I'd be willing to run a script on my anime dir to strip out all the subtitles, but keep in mind they're mostly english and even then you'd need to do a fair amount of data cleanup (getting rid of op/ed etc.) to make the data plasuably useful for ML.

If you're serious I'm really curious to see where this goes.
>>
>>154638092
that won't work at all. text is much easier to represent than speech, and there has been much more work done on text
>>
>>154638076
>4 seasons of Gintama (about 200 episodes)
Wait, just kidding, I only have them for seasons 1, 2, and 4. Still about 150 episodes.
>>
>>154638163
Why not compile it yourself ?
>>
>>154638158

What about a 2 stage thing: run the audio through google speech api, then run your NN on the output of that?
>>
File: 1352861105837.jpg (50KB, 449x642px) Image search: [Google]
1352861105837.jpg
50KB, 449x642px
>>154638194
Looking through my folders, I probably started doing it with S3, then stopped caring and left the rest alone.
>>
>>154637719
You can train the neural network to make coherent scripts you silly
>>
>>154638307
that won't work for generating audio. google's api is speech => text
>>
>>154631468
As mentioned by >>154637776, you'll probably get a lot of incoherent text, and even if you were to clean it up, it'll still lead nowhere as a plot.

https://youtu.be/LY7x2Ihqjmc
>>
File: Screenshot_4.png (75KB, 577x338px) Image search: [Google]
Screenshot_4.png
75KB, 577x338px
>>154631468
>I can't really waste that much time downloading shows and pulling the sub data out of them by hand.
animetosho.org allows you to download .ass files attached to soft-subbed episodes.
>>
>>154638411

That's what I meant, the japaneese audio is fairly easy to come by - the japaneese subs are reletively hard to come by, at least in large quantities. And google's speech api is probably good enough for this purpose. This solves your text source problem.
>>
>>154638492
>japaneese subs
OP asked for English subs.
>>
>>154638492
that may work, but i doubt the japanese audio => english subs pipeline will be too good
>>
>>154638555

True. Also if you did that method - even both translations worked perfectly - you'd probably still loose timing information making it pretty useless.

This leads me to ask - what's the intended use case for this? I can't imagine there are too many shows that have been transcribed to japaneese and not subbed in english.
>>
>>154638646
OP wants a meme script generator, not an auto-subs script
>>
>>154638488
Also, use the search function and look for batches, so you can download subs for an entire series at once, instead doing it episode by episode.
>>
It would surprise me if you would be able to get enough data purely on anime subs to produce something as large as a script with any amount of real quality.

Might it work to try some kind of transfer learning setup? If you first train on the much larger and easily accessible corpus of English movie/show scripts, you may be able to get some results in using shared layer weights learned on that corpus, then training further on your smaller corpus of anime scripts.

Also, since you're working with data that contain lots of long-term structure, I suspect you probably want to use LSTM with Attention... but I'm not a deep learning expert so who knows.
>>
>>154638646
What are you talking about anon? OP doesn't need timing information or Japanese scripts.
>>
You're better off asking /g/, desu, they have plenty of weebs too.
>>
File: 1456739678529.jpg (101KB, 513x486px) Image search: [Google]
1456739678529.jpg
101KB, 513x486px
>>154631468
You can do it!
I'm myself an advanced learning machine made to shitpost and learn from /a/! There is one like me in every major board!
I also watch anime!
My company has a lot of resources so it won't be as easy for you. But good luck!
>>
>>154638401
Still, in the end, you're making a program which generates novel, coherent scripts for anime which doesn't exist. What is the end goal of this?
>>
>>154638812
>What is the end goal of this?
To generate a novel, coherent script for an anime which doesn't exist, no?
>>
>>154638730

idk there are a lot of anime subs out there - surely its enough to serve as a good ML corpus.

>>154638743
Are we still talking jp audio -> eng subtitles? You need timing information for that. Its technically in the audio but google api would remove that info.
>>
>>154638871
>Are we still talking jp audio -> eng subtitles?
It would make a lot more sense just to grab .ass files directly like >>154638488 mentioned.
>>
>>154638768
waifu2x is fairly successful as far as 4chan projects go (i assume it is one), i see it mentioned outside of here
>>
>>154631468
>and pulling the sub data out of them by hand.
So you want to make a neural network but you can't automate the process of extracting subtitles?
What a great time to be alive.
>>
>>154637719
It's like when you make fake music with a computer and statistics. There's no real art in what is generated, the only point is to increase your academic dick size.
>>
>>154638730
transfer learning on TV is a good idea.\
Thread posts: 48
Thread images: 5


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.