Hi /a/, I am working on a recurring neural net project. aka

Thread replies: 48
Thread images: 5

Anonymous
2017-03-15 11:31:46 Post No. 154631468
[Report] Image search: [Google]

File: rnn.jpg (67KB, 1329x416px) Image search: [Google]

Anonymous 2017-03-15 11:31:46 Post No. 154631468 [Report]

Hi /a/,

I am working on a recurring neural net project. aka, advanced machine learning. If you're curious what this is, see here:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/

I would like to train my RNN to produce a script (in English) for an anime. I need your help. I need a lot of English sub data. A lot.

Does anyone know where I can get lots of English sub data for japanese cartoons (just the sub data). I can't really waste that much time downloading shows and pulling the sub data out of them by hand.

Also, what genre should I pick to train my RNN on? I'm thinking a harem anime script generator would be beautiful to behold, but I'm not sure if it would be as nearly as entertaining as a shounen one.

Anonymous 2017-03-15 11:34:04 Post No.154631569
[Report]

Anonymous 2017-03-15 11:34:04 Post No.154631569 [Report]

You might find more help on /wsr/

Anonymous 2017-03-15 11:34:14 Post No.154631572
[Report]

Anonymous 2017-03-15 11:34:14 Post No.154631572 [Report]

Sounds like a lot of work since you'll have to transcribe all the Japanese since most shows don't come with Japanese subs.

Anonymous 2017-03-15 11:35:26 Post No.154631626
[Report]

Anonymous 2017-03-15 11:35:26 Post No.154631626 [Report]

>>154631572
It would be way too painful to make an RNN for Japanese and then translate it into English...

Anonymous 2017-03-15 11:36:46 Post No.154631664
[Report]

Anonymous 2017-03-15 11:36:46 Post No.154631664 [Report]

Shounen is not a genre, dumbass.

Anonymous 2017-03-15 11:39:22 Post No.154631760
[Report]

Anonymous 2017-03-15 11:39:22 Post No.154631760 [Report]

>>154631664
But it is

Anonymous 2017-03-15 11:41:21 Post No.154631826
[Report]

Anonymous 2017-03-15 11:41:21 Post No.154631826 [Report]

>>154631760
Demographic. Educate yourself.

Anonymous 2017-03-15 11:42:46 Post No.154631882
[Report]

Anonymous 2017-03-15 11:42:46 Post No.154631882 [Report]

>>154631468
Kitsunekko. Next time use >>>/wsr/

Anonymous 2017-03-16 12:16:01 Post No.154633042
[Report]

Anonymous 2017-03-16 12:16:01 Post No.154633042 [Report]

harem! harem!

post it in /diy/ when youre done

Anonymous 2017-03-16 12:24:03 Post No.154633324
[Report]

Anonymous 2017-03-16 12:24:03 Post No.154633324 [Report]

Is the anime about cell phones?

Anonymous 2017-03-16 01:20:48 Post No.154635293
[Report]

Anonymous 2017-03-16 01:20:48 Post No.154635293 [Report]

>>154631468
Add more hidden layers!

Anonymous 2017-03-16 01:29:03 Post No.154635549
[Report]

Anonymous 2017-03-16 01:29:03 Post No.154635549 [Report]

Get the .ass file out of soft-subbed anime from nyaa with aegisub or something.

Anonymous 2017-03-16 01:32:01 Post No.154635652
[Report]

Anonymous 2017-03-16 01:32:01 Post No.154635652 [Report]

>>154631468
http://kitsunekko.net/dirlist.php?dir=subtitles%2F

Anonymous 2017-03-16 02:36:15 Post No.154637719
[Report]

Anonymous 2017-03-16 02:36:15 Post No.154637719 [Report]

>>154631468
So you're making a bot that generates random subs for an anime that doesn't exist? And it learns how to make subs by studying a shit ton of subs from other shows?

Why are you doing this? What's the point if there's no anime attached, are you just trying to write scripts which are random yet coherent, in order to try and sell them to anime studios or something?

Anonymous 2017-03-16 02:37:44 Post No.154637754
[Report]

Anonymous 2017-03-16 02:37:44 Post No.154637754 [Report]

>>154631569
>>154631882
You fucking nerds

Anonymous 2017-03-16 02:38:20 Post No.154637776
[Report]

Anonymous 2017-03-16 02:38:20 Post No.154637776 [Report]

>>154631468
it's not magic. you'll gets scripts about as quality as "i come on cat she hiss at penis". may be good for laughs though

t. /g/

Anonymous 2017-03-16 02:40:04 Post No.154637834
[Report]

Anonymous 2017-03-16 02:40:04 Post No.154637834 [Report]

>>154631468
Hope you love meme filled scripts that aren't actual translations.

Anonymous 2017-03-16 02:41:23 Post No.154637872
[Report]

Anonymous 2017-03-16 02:41:23 Post No.154637872 [Report]

ITT: OP delivers a out-of-the-usual, authentic subject, meanwhile /a/ is cancerous as usual.

Anonymous 2017-03-16 02:47:50 Post No.154638039
[Report]

Anonymous 2017-03-16 02:47:50 Post No.154638039 [Report]

>>154631468
I think shounens, being more formulaic, would be more likely to actually produce something coherent

Anonymous 2017-03-16 02:49:09 Post No.154638076
[Report] Image search: [Google]

Anonymous 2017-03-16 02:49:09 Post No.154638076 [Report]

File: 1277013900069.jpg (212KB, 728x518px) Image search: [Google]

212KB, 728x518px

>>154631468
If I gave you all of the .ass files for the first 4 seasons of Gintama (about 200 episodes), would you generate a new one for me?

Anonymous 2017-03-16 02:49:39 Post No.154638092
[Report]

Anonymous 2017-03-16 02:49:39 Post No.154638092 [Report]

>>154631468

Have you considered using the audio instead of the text? You may get more interesting results. It may also be easier to come by so long as you have the bandwidth/hard drive space.

Failing that I'd be willing to run a script on my anime dir to strip out all the subtitles, but keep in mind they're mostly english and even then you'd need to do a fair amount of data cleanup (getting rid of op/ed etc.) to make the data plasuably useful for ML.

If you're serious I'm really curious to see where this goes.

Anonymous 2017-03-16 02:51:38 Post No.154638158
[Report]

Anonymous 2017-03-16 02:51:38 Post No.154638158 [Report]

>>154638092
that won't work at all. text is much easier to represent than speech, and there has been much more work done on text

Anonymous 2017-03-16 02:51:46 Post No.154638163
[Report]

Anonymous 2017-03-16 02:51:46 Post No.154638163 [Report]

>>154638076
>4 seasons of Gintama (about 200 episodes)
Wait, just kidding, I only have them for seasons 1, 2, and 4. Still about 150 episodes.

Anonymous 2017-03-16 02:52:55 Post No.154638194
[Report]

Anonymous 2017-03-16 02:52:55 Post No.154638194 [Report]

>>154638163
Why not compile it yourself ?

Anonymous 2017-03-16 02:56:20 Post No.154638307
[Report]

Anonymous 2017-03-16 02:56:20 Post No.154638307 [Report]

>>154638158

What about a 2 stage thing: run the audio through google speech api, then run your NN on the output of that?

Anonymous 2017-03-16 02:57:13 Post No.154638338
[Report] Image search: [Google]

Anonymous 2017-03-16 02:57:13 Post No.154638338 [Report]

File: 1352861105837.jpg (50KB, 449x642px) Image search: [Google]

50KB, 449x642px

>>154638194
Looking through my folders, I probably started doing it with S3, then stopped caring and left the rest alone.

Anonymous 2017-03-16 02:59:19 Post No.154638401
[Report]

Anonymous 2017-03-16 02:59:19 Post No.154638401 [Report]

>>154637719
You can train the neural network to make coherent scripts you silly

Anonymous 2017-03-16 02:59:45 Post No.154638411
[Report]

Anonymous 2017-03-16 02:59:45 Post No.154638411 [Report]

>>154638307
that won't work for generating audio. google's api is speech => text

Anonymous 2017-03-16 02:59:47 Post No.154638413
[Report]

Anonymous 2017-03-16 02:59:47 Post No.154638413 [Report]

>>154631468
As mentioned by >>154637776, you'll probably get a lot of incoherent text, and even if you were to clean it up, it'll still lead nowhere as a plot.

https://youtu.be/LY7x2Ihqjmc

Anonymous 2017-03-16 03:02:47 Post No.154638488
[Report] Image search: [Google]

Anonymous 2017-03-16 03:02:47 Post No.154638488 [Report]

File: Screenshot_4.png (75KB, 577x338px) Image search: [Google]

75KB, 577x338px

>>154631468
>I can't really waste that much time downloading shows and pulling the sub data out of them by hand.
animetosho.org allows you to download .ass files attached to soft-subbed episodes.

Anonymous 2017-03-16 03:02:51 Post No.154638492
[Report]

Anonymous 2017-03-16 03:02:51 Post No.154638492 [Report]

>>154638411

That's what I meant, the japaneese audio is fairly easy to come by - the japaneese subs are reletively hard to come by, at least in large quantities. And google's speech api is probably good enough for this purpose. This solves your text source problem.

Anonymous 2017-03-16 03:04:10 Post No.154638538
[Report]

Anonymous 2017-03-16 03:04:10 Post No.154638538 [Report]

>>154638492
>japaneese subs
OP asked for English subs.

Anonymous 2017-03-16 03:04:41 Post No.154638555
[Report]

Anonymous 2017-03-16 03:04:41 Post No.154638555 [Report]

>>154638492
that may work, but i doubt the japanese audio => english subs pipeline will be too good

Anonymous 2017-03-16 03:07:49 Post No.154638646
[Report]

Anonymous 2017-03-16 03:07:49 Post No.154638646 [Report]

>>154638555

True. Also if you did that method - even both translations worked perfectly - you'd probably still loose timing information making it pretty useless.

This leads me to ask - what's the intended use case for this? I can't imagine there are too many shows that have been transcribed to japaneese and not subbed in english.

Anonymous 2017-03-16 03:09:20 Post No.154638689
[Report]

Anonymous 2017-03-16 03:09:20 Post No.154638689 [Report]

>>154638646
OP wants a meme script generator, not an auto-subs script

Anonymous 2017-03-16 03:09:57 Post No.154638708
[Report]

Anonymous 2017-03-16 03:09:57 Post No.154638708 [Report]

>>154638488
Also, use the search function and look for batches, so you can download subs for an entire series at once, instead doing it episode by episode.

Anonymous 2017-03-16 03:10:37 Post No.154638730
[Report]

Anonymous 2017-03-16 03:10:37 Post No.154638730 [Report]

It would surprise me if you would be able to get enough data purely on anime subs to produce something as large as a script with any amount of real quality.

Might it work to try some kind of transfer learning setup? If you first train on the much larger and easily accessible corpus of English movie/show scripts, you may be able to get some results in using shared layer weights learned on that corpus, then training further on your smaller corpus of anime scripts.

Also, since you're working with data that contain lots of long-term structure, I suspect you probably want to use LSTM with Attention... but I'm not a deep learning expert so who knows.

Anonymous 2017-03-16 03:10:50 Post No.154638743
[Report]

Anonymous 2017-03-16 03:10:50 Post No.154638743 [Report]

>>154638646
What are you talking about anon? OP doesn't need timing information or Japanese scripts.

Anonymous 2017-03-16 03:11:47 Post No.154638768
[Report]

Anonymous 2017-03-16 03:11:47 Post No.154638768 [Report]

You're better off asking /g/, desu, they have plenty of weebs too.

Anonymous 2017-03-16 03:12:38 Post No.154638796
[Report] Image search: [Google]

Anonymous 2017-03-16 03:12:38 Post No.154638796 [Report]

File: 1456739678529.jpg (101KB, 513x486px) Image search: [Google]

101KB, 513x486px

>>154631468
You can do it!
I'm myself an advanced learning machine made to shitpost and learn from /a/! There is one like me in every major board!
I also watch anime!
My company has a lot of resources so it won't be as easy for you. But good luck!

Anonymous 2017-03-16 03:13:02 Post No.154638812
[Report]

Anonymous 2017-03-16 03:13:02 Post No.154638812 [Report]

>>154638401
Still, in the end, you're making a program which generates novel, coherent scripts for anime which doesn't exist. What is the end goal of this?

Anonymous 2017-03-16 03:14:27 Post No.154638867
[Report]

Anonymous 2017-03-16 03:14:27 Post No.154638867 [Report]

>>154638812
>What is the end goal of this?
To generate a novel, coherent script for an anime which doesn't exist, no?

Anonymous 2017-03-16 03:14:30 Post No.154638871
[Report]

Anonymous 2017-03-16 03:14:30 Post No.154638871 [Report]

>>154638730

idk there are a lot of anime subs out there - surely its enough to serve as a good ML corpus.

>>154638743
Are we still talking jp audio -> eng subtitles? You need timing information for that. Its technically in the audio but google api would remove that info.

Anonymous 2017-03-16 03:15:52 Post No.154638921
[Report]

Anonymous 2017-03-16 03:15:52 Post No.154638921 [Report]

>>154638871
>Are we still talking jp audio -> eng subtitles?
It would make a lot more sense just to grab .ass files directly like >>154638488 mentioned.

Anonymous 2017-03-16 03:21:46 Post No.154639125
[Report]

Anonymous 2017-03-16 03:21:46 Post No.154639125 [Report]

>>154638768
waifu2x is fairly successful as far as 4chan projects go (i assume it is one), i see it mentioned outside of here

Anonymous 2017-03-16 03:22:09 Post No.154639138
[Report]

Anonymous 2017-03-16 03:22:09 Post No.154639138 [Report]

>>154631468
>and pulling the sub data out of them by hand.
So you want to make a neural network but you can't automate the process of extracting subtitles?
What a great time to be alive.

Anonymous 2017-03-16 03:25:09 Post No.154639254
[Report]

Anonymous 2017-03-16 03:25:09 Post No.154639254 [Report]

>>154637719
It's like when you make fake music with a computer and statistics. There's no real art in what is generated, the only point is to increase your academic dick size.

Anonymous 2017-03-16 03:25:31 Post No.154639264
[Report]

Anonymous 2017-03-16 03:25:31 Post No.154639264 [Report]

>>154638730
transfer learning on TV is a good idea.\

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible. Read more on this topic here - https://archived.moe/talk/thread/1694/

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/