[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

DEEPMIND DOES IT AGAIN https://deepmind.com/blog/wavenet-g

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 161
Thread images: 16

File: Sexy-Robot.jpg (208KB, 1280x1024px)
Sexy-Robot.jpg
208KB, 1280x1024px
DEEPMIND DOES IT AGAIN

https://deepmind.com/blog/wavenet-generative-model-raw-audio/
>>
Damn, that's good.
>>
>>56501542
Can someone make a realistic waifu voice with this?
>>
>>56501659
YES THEY CAN.

What a time to be alive.
>>
>>56501659
probably
>>
How long before I can turn my own voice into my computer?
>>
>>56501785
soon
>>56501698
Actually they can generate ANY audio, as long as they have enough data.
From voice to music to natural sounds and more. This is really huge.
>>
>>56501865
can't wait until the sourcecode gets public or ripped

just imagine that one guy who samples 1000 hentai animes just to generate waifus in 3d VR porn
>>
so where can I download it?
>>
but can it generate basic emotions and tone.

didn't think so

humans 1
ai 0
>>
>>56501542
noice!
the output need to be sampled at a higher rate and then highpass filtered to remove some of the noise, other than that bretty gud
>>
File: 1373486130923.png (3KB, 184x172px) Image search: [Google]
1373486130923.png
3KB, 184x172px
>>56501903
so.. how long do you guys think will something like this take? Like how long until we at least can communicate with our waifus trough our computers?
>>
>>56502102
when the source gets released, probably 3 years but japan only at the beginning
>>
>>56502125
>source gets released

But, Google/DeepMind will never release the training data that actually generated the voice.
>>
This sounds like a great way to make vocaloids even better. Makes me want to get a new vocaloid using wavenet.
>>
>>56502570
the vocaloid makers hack deepmind and improve their hatsune miku
>>
I'm gonna scam so many old bastards with this.
>>
jesus, that is incredibly impressive. even the throwaway music bit at the bottom.
>>
>>56501542
But what if I want a robotic voice?
>>
Jesus
>>
>>56502756
You add a robotic voice filter over it?
>>
File: 193.jpg (22KB, 300x100px) Image search: [Google]
193.jpg
22KB, 300x100px
>>56502900
It wouldn't be the same!
>>
The future is scary.
>>
>>56503226
the future is AMAZING
>>
>>56503268
>Hello, %Anon%, this is Cortana. I have detected illegal software on your system. Please, put hands behind your back and lie on the floor. The authorities are on their way. You have 20 seconds to comply.
>>
>>56503346
>While you are waiting enjoy this piano music I composed for you. Have a nice day and remember, if you have nothing to hide you have nothing to fear.
>>
So could something like this be used to replace voice actors in indie video games?

Could it use voice samples of VAs to replicate their voice?
>>
>>56503489
yes, yes
>>
Vocaloid upgrades soon?
>>
>>56503489


This is actually a very interesting angle.
Give it a decade and voice actors and singers are going to be practicaly useless.
Only thing that matters is the guy creating the melody,lyrics and the text that these things read.
>>
>>56503956
They'll probably have AI that writes scripts for that by then too
>>
It just dawned on me

>AUTOMATIC ORGASM SOUND GENERATOR
>with the voice of every girl ever
>including famous actresses, singers, etc
>>
>>56503956
>>56503983
But stuff made with real people will be super authentic and REAL man.
>>
Interesting how some parts of the future arrive much earlier than expected.
>>
>>56501542
How long until the trolling beggings?

https://www.youtube.com/watch?v=1B488z1MmaA
>>
I listened to all those audio samples and the voice doesn't sound much better.

I thought it was going to sound like a real human and not robotic.

It still sounds robotic and grainy.
>>
>>56504487
Most people in this thread are just jumping the gun.

We're still a few decades away from making it practical.
>>
File: G88xA.jpg (153KB, 1096x729px) Image search: [Google]
G88xA.jpg
153KB, 1096x729px
>>56504519
I just want my own personal assistant that sounds like a real human female who can talk to me as i'm trying to fall asleep.

Since women today are such whores being brainwashed by SJW garbage from the corporate jew media I have no choice but to rely on technology as a substitute.
>>
>>56504487
>>56504519
It's still way ahead of previous results, and it can fool many humans. This shit was science fiction only yesterday.
>We're still a few decades away
Make it one year.
>>
>>56504568
So you just want that OS from the movie She?
>>
>>56501542
$1.00 has been deposited into your Google Wallet®.
>>
>>56504681
>guy literally gets cucked by a AI
it was silly
>>
>>56504753
Not at all. While you are sleeping, working or doing whateaver, the computer will be using all power to learn.

No wonder she was cucking him. She could process a shitload of information before he could say Good Morning. She needed more ppl to keep her busy somehow.
>>
>>56502026
it can make toned language, imitating samples. did you read it? can also be conditioned to use tone to imitate emotion
>>
>>56503346
>google is microsoft
we might get very convincing, pleading ads, or the threat of using audio recordings to track tone used when speaking of certain subjects for the purpose of better targeted ads.
>>
>>56502102
>communicate with our waifus
In an intelligent way or just
>hello oniichan
>hello oniichan
>hello oniichan
>>
>>56502432
>release the training data
Who the fuck cares? Make your own training data. I'm going to bring Kuuko back from the dead when this gets released.
>>
>>56504874
time to make a porn audio dataset for >>56504015
>>
Something something all that fuzz.
How the hell are they supposed to get rid of it?
>>
>>56501542
I think the most impressive part of this is 1) how raw audio generation isn't limited to human speech. 2) the model replicates breathing sounds and such as well, giving it an illusion of actually sounding like a real human bean.

Eithwr way, how long until a working model is released into the public? I imagine Google wouldn't want Apple or Microsoft to gain access to this.
>>
>>56504912
Make another neural net for static removal.
>>
the ones where it makes up its own language is trippy as fuck. its like an alien race speaking to you.
>>
>>56504912
>>56504979
literally how the brain works

The real problem is the massive comuting power you need to make this work. google can afford it, but there's no way you can run this on a normal pc, no matter the gpus.
>>
>>56501542
As someome who does research in ML, that is honestly arousing.
>>
>>56505032
just make a makeshift supercomputer with your thinkpad hoard.
will work well enough
>>
>>56505108
as someone who jacks off to vocaloids, that is honestly arousing
>>
>>56504848
Google is even worse than Microsoft desu.
Because google does exactly the same thing as microsoft, but you can't switch to another internet if you don't like it.
>>
>>56501542
musicfags on suicide watch
>>
>>56506228
I want to crosspost this to /mu/ but I'm too tired.
>>
>>56503956
Voice actors and singers will just start suing people who imitate their voice. Laws will be passed that make it illegal.
>>
>>56506332
it can imitate billions of voices. it would be silly to make it illegal, it will never happen.
>>
>>56506332
>Voice actors and singers will just start suing people who imitate their voice. Laws will be passed that make it illegal.
Publicity rights in some states already cover voice it seems.
>>
>>56506332
>>56506404
But that's retarded. What stops you from engineering a voice that sounds like that of a famous singer while still sounding slightly different? What stops you from generating a random voice that turns out to be the voice of a random girl in south africa? Will she sue you too? We may as well outlaw sound.
>>
>>56503956
>singers are going to be practicaly useless.

yes, because traditional guitars and pianos got totally replaced by e-guitars, synthies and computers.
>>
>>56506507
actually they did
>>
>>56501542
ELI5?
>>
>>56506507
in radio pop maybe
>>
>>56506491
Yeah, it is retarded. Hopefully it does not spread.

Publicity rights were originally intended to be somewhat like trademark to keep people from falsely using a name, signature, image, voice, etc. to claim that a person was endorsing a product. Of course now they are basically yet another way for famous people to try and bother people they don't like with lawsuits or to try and get money from a company.
>>
>>56506549
If you go to the top of your screen, there's a little bar you can click on and type words into. Simply type in "reddit.com", but without the punctuation marks, and you will be transported to a place appropriate for you! :)
>>
>>56506491
Monsanto has copyrights on genetics of seeds. These seeds happen to blow off trucks and on to peoples' farms. Monsanto then sneaks onto their farm and tests for these genes, and if they find them, say goodbye to your farm/retirement/belongings.

What am I saying is: it will happen.
>>
>>56506748
>Monsanto then sneaks onto their farm
that sounds very illegal
>>
Radio moderators are now obsolete.
>>
>>56506940
They don't give a fuck, half of the goverment has shares in monsanto.
They have literally written Monsanto seeds into the new Iraqi constitution.

these guys are above the law.
>>
>>56501542
Sounds like parametric with less reverb. whoopdeedoo

Still sounds fake as shit.
>>
>>56506748
there is worse stuff, pars of the human genome are actually copyrighted (most of those are related to some disease/condition) and you cant sell medicine (and if im not mistaken not even research either) that targets those genes without permission and paying the fees.
usa sure is the land of freedom...
>>
The audio samples where the wavenet generates its own audio output are creepy as fuck
>>
>>56504477
wtf I'm liking this
>>
>>56507193
Really? It just reminds me of this
>>
>Because raw audio is typically stored as a sequence of 16-bit integer values (one per timestep), a
>softmax layer would need to output 65,536 probabilities per timestep to model all possible values.
>To make this more tractable, we first apply a μ-law companding transformation (ITU-T, 1988) to
>the data, and then quantize it to 256 possible values:

>f (x) = sgn(x)*ln(1+255*abs(x))/ln(1+255)

>where −1 < x < 1 and μ = 255. This non-linear quantization produces a significantly better
>reconstruction than a simple linear quantization scheme. Especially for speech, we found that the
>reconstructed signal after quantization sounded very similar to the original.

So does that mean that each sample in generated sound can only have one out of 256 values (ranged between 0 and 65535), essentially making it 8-bit instead of 16 bit?
>>
>>56508335
Yes
>>
This would be pretty good for ASMR stuff.
>>
>>56508335
Yes but that logarithm probably means that they have more resolution in the middle frequencies
>>
File: plinkett.jpg (484KB, 1039x792px) Image search: [Google]
plinkett.jpg
484KB, 1039x792px
Finally I can get a virtual Mr. Plinkett who reads 4chan posts to me all day
>>
>>56501542
fuck yeah, nobody is going to need voice actors ever again.
>>
>>56503489
that was my first thought
Audio files make up the majority of the game size in most cases, since they just dont compress well. Also if you want to change a single line of dialogue later on, you need to hire the same voice actor again which is costly and time consuming.
If all voice can be stored in a kind of LaTeX or XML format that will not only speed up development, but also allow dynamically generated dialogues that arent just a bunch of text
>>
>>56506332
sadly this doesnt sound all too far fetched
>>
>>56504874
there is no way you'd even remotely catch up with the amount of training data that google has though, even as you're posting on 4chan right now you're feeding it with shit tons of training data through the captcha
>>
>>56501542
I like how all the piano tracks start out normal and go fucking ham before cutting out.
>>
>>56510030
can't risk letting it gain sentience at this stage, our anuses are unprepared
>>
DAISY
DAISY
GIVE ME YOUR ANSWER DO~
>>
That piano shit is neato
>>
>>56509994
As the article said, generating the audio output takes forever, so forget about generating it on the fly. Could save the cost of the voiceactor.

People will still notice, though. It's not that it's on par with human voiceactors. It's just getting in the good enough to be tolerable range.
>>
It's good, really good, but I feel like if this is to be done properly it needs more forms of input. The emotion behind different words, the emphasis, pause length, etc.

If you could develop some sort of system where you can both input text and specify characteristics of speech within the text, we'd be getting close to complete accurate synthesis.
>>
>>56501542
Oh god that generated music lmao

Sounds like beethoven having a stroke while playing
>>
Is there anything that deep learning CAN'T do?
>>
>>56510210
Read the article, the model varies output according to context.

Seems they couldn't get rid of the noise though, or maybe their training data is contaminated with noisy samples. Because it's harder to judge noisy samples, they might have had higher ratings by humans, so the model learned to include noise to get better grades for its output.
>>
File: 13558.jpg (46KB, 500x500px)
13558.jpg
46KB, 500x500px
>>56501542
Self written ASMR incoming, faggots
>relax, anon, take a deep breath and count to 100 with me
>you are great, anon
>run away with me anon
>let me take that big pulsating cock with my tiny feet, anon
>>
This shit is fucking creepy

https://storage.googleapis.com/deepmind-media/pixie/knowing-what-to-say/first-list/speaker-4.wav

>the breathing and mouth noises
>>
>>56510254
It's a universal approximator, so no.
>>
>>56510275
Can it create a virtual gf for me?
>>
File: 1459029697107.jpg (141KB, 392x309px)
1459029697107.jpg
141KB, 392x309px
>>56510254
Deep learning is a meme. A very effective meme, but a meme nonetheless.
So they have a cluster of 100k Nvidia Teslas and literally petabytes of datasets, big fucking whoop they can do impressive shit given a year time...
With that same amount of power you could probably simulate a universe and wait for it to develop life able of speech, and it would probably be more efficient.

I'll be impressed when they can do all this on a battery powered smartphone, but deep architecture are not really the way right now.
Source: master's in AI
>>
>>56510321
Clouds nigga

Why do things locally?
>>
>>56510321
Virtual masters apparently. If you can build a network in hardware, this shit will get faster. You don't render graphics on a CPU either.
>>
File: 1448125400236.png (286KB, 500x513px)
1448125400236.png
286KB, 500x513px
>>56501542
>https://deepmind.com/blog/wavenet-generative-model-raw-audio/

It sounds much better, but you can still clearly hear it's a robot voice.

The tone of words inside the sentence seems to be the biggest problems, there's too much tone difference between words is different from how a normal person would structure a sentence.

I don't understand, is it too hard to build a system that can recognize based on which position a word has in a sentence, what tone of voice should be used?
>>
>>56510325
>power inefficiency is solved by delegating computation
I can hear stockholders laughing
>>
>>56510321
You only need to teach the network once. Then just hardcode the parameters in a 20kb file and boom, perfect text to speech on a toaster.
>>
>>56510344
It seems to be that the system is still pretty dumb and is saying things based on the words and mostly ignoring punctuation.

I'd imagine you can use deep learning to begin to understand the context of words and phrases, but that'll take a lot longer and is a lot more difficult to interpret.

Give it ten years and this shit'll be writing its own books. And reading them too.
>>
>>56510360
You need to generate 16k points per second though.
>>
>>56510341
>>56510360
You clearly have no idea how NNs work so go educate yourselves before spouting nonsense.
>inb4 a neural network has literal dots and lines like you see in the drawings
>>
>>56510321
This, honestly

I thought the same shit when nvidia started showing off their computer vision hardware not too long ago claiming it's 90000000 times more effective than standard algorithm approach (like opencv)

The catch is that their hardware costs several thousand dollars meanwhile the algorithm approach is literally free and can run on your $20 raspberry pi

Marketing bullshit is what this is, but hey you can put it in the cloud xDDDDD
>>
>>56510387
Costs will decrease as time go's by, as they always do
>>
>>56510378
this is a good point, if the network itself is too hueg it would indeed be slow
>>56510385
this is a clueless yuroshit /v/ermin moron who should drink bleach
>>
>>56510385
Why don't you enlighten me.
>>
Are there any other need deep learning things to have come out recently? I always enjoy reading about them.
>>
>>56510544
>Need

That should say neat. I can't even blame a phone as I'm on a PC and my brain just fucked up.
>>
>>56510485
>I'll pretend to know what he's talking about
>put an ad hominem in there, that'll teach him
Would you care to explain how would convolution be faster on dedicated hardware than it is now on gpgpus? Like, what exact operations do you feel are the bottleneck in convolution and SGD right now? Is it the addition or the multiplication? Or do you think that you could implement faster memory access than ddr5?
Go on, I'll wait while you think
>>
>>56510725
>>>/v/
>>
File: J2zNkC6.jpg (150KB, 1282x1901px)
J2zNkC6.jpg
150KB, 1282x1901px
>>56501659
imagine if you could idk sample Emma Watson's voice and then make a robot voice that sounds exactly like her? And then make her say shit. Imagine the collapse of Hollywood if that was possible? Fuck, the whole concept of identity would be in danger if you could simulate a person's voice and make a realistic model of that person inside some VR world. Why would anyone want real people after that?
>>
>>56503983
or have AI that is present in each character of the game and makes characters behave realistically like real people would, thus creating it's own narrative that is unpredictable
>>
>>56510725
GPUs are made to render one image at a time. Fix your network parameters, skip the memory, directly pass output to the next calculation layer.

Drop the fully connected layers requirement, gain speed using a chip that specializes in the sparse matrixmultiplication your network produces.

And I don't see how not implementing it using matrix multiplications but node for node wouldn't make it faster. N:1 specialized calculation units, no memory to swap to.

So fuck off. You're "expertise" with shit counts for nothing because you can't see past whats in front of you.
>>
>>56510871
Is that emma watson? Holy shit she looks like a dude
>>
This just put voice actors and impersonators out of business.
>>
>>56510948
>small head
>wide mouth
>small nose
>big eyes

>looks like a dude

you wat m8, you must be a very feminine looking man to think she looks like a dude
>>
>this technology will literally be used for A) evil and B) advertising
who do I have to kill to get the good future back
>>
Does this page crash firefox for anyone else?
>>
File: 135 - UyXdK.gif (427KB, 200x198px) Image search: [Google]
135 - UyXdK.gif
427KB, 200x198px
>>56502900

>technology has advanced to the point that we have to apply robot sounds to a voice generated robotically because it sounds TOO human

Our memes stopped being dreams
>>
>>56511099
now you get to begin the long descent into what /v/ philosophers have always dreaded: when AIs/games are too similar to people/life, you realize you were only involved in AIs/games because people/life are actually shit
>>
>>56501542
Holy shit that's some fucking amazing shit right there.
>>
Ok so find a use for it
>>
File: deny_urself_my_lad.png (12KB, 246x200px) Image search: [Google]
deny_urself_my_lad.png
12KB, 246x200px
>>56510912
>node for node
>skip the memory
>drop FC
0/10 apply yourself
>>
>>56511151
You know, that's a good point. If you're interacting with something as good as human, does that mean you're being social?
>>
It still has problems of timing, pitch and intonation. When you read a book aloud you grab the *MEANING* of a particular sentence or paragraph and adapt your voice to suit. For instance you might recognize that a character has a certain accent and whenever they speak you may adjust your voice accordingly. Or when a character is angry you may adjust your tone, pitch and volume.

Reading speech fluently is one thing. Understanding it's meaning is another.

That is why most of these simulations never really work well. The musical pieces are just a jumble of notes with no real emotion to them. There is no melody and the tempo jumps around all over the place. I am sure some rules of what makes a good melodic piece could be put in place through analysis but it will never be as good as the human ear is at spotting this.
>>
>>56501542
>music

that's what i call random button mashing
>>
>>56510254
Symbolic reasoning, for now...
>>
>>56501542
>every rpg can now be grounded with written content and wont need to break the budget or disk space fitting in voice actor recordings

Fuck yes. Now the AAA shitter studios might go back to presenting long thoughtful dialogues.
>>
>>56507407
Hey me too!
>>
>>56502102
Some ASMR shit will be good.
>>
>>56503489
Imagine the games that could come out of this.

Machine learning video games like an endless skyrim because the AI would continually write new code for new levels and the AI would be able to synthesize the dialogue without the need of voice actors. Sounds dreamy.
>>
>>56504487
All that's needed is a brief pause with a simulated audio for inhaling air to give it a bit more realism
>>
SOON
https://www.youtube.com/watch?v=iNQKMh3JhFc
>>
>>56512072
Yeah, that's not happening. Not for a long, long, long time. If ever.
>>
Fuck off retard. It's fucking nothing as 90% of what deepmind does, is, but hey, it's related to google so it has to be shilled as NEW and INNOVATIVE everytime they fucking fart.
>>
>>56512072
Yeah infinite world bt boring as fuck. OR have you forgotten no mans sky already?
>>
>>56512267
This has to be bait, right?
>>
>>56512288
t. inbred
>>
>>56502026
it even imitates breathing and lip smacking
>>
>>56512294
Typical popsci-subscribing retard, everybody!
I bet you actually think apple invented tablets, too.
>>
>>56511189
train on your favorite actress, input dirty talk
>>
>>56512288
no man's sky was developed by pleb human developers and half of it's budget was spent on marketing
>>
File: 1466694966378.png (34KB, 201x160px) Image search: [Google]
1466694966378.png
34KB, 201x160px
>>56510272
>>
>>56504681
I want an OS like the movie Her but I'd keep it offline and in my basement.
>>
Today computer voice say
'Hi'
tomorrow
"Hi i'm skynet'
day after
'Bend over and forgot about the lube"
>>
>>56514095
Will it sound like a qt at least?
>>
>>56514391
>not wanting it to sound like Schwartzenegger
Are you gay or something?
>>
>>56511049
Literally every capitalist, politician and religious figure.

Good luck. We're all counting on you.
>>
>>56512366
AYYO

HOL UP HOL UP
>>
>>56514391
'Bend over and forgot about the lube desu"
>>
>>56511049

It concerns me that you think those are two separate things.
>>
>>56510206
>People will still notice
so what? The quality is already good enough, especially for short NPC chatter, all this "Hello. Welcome. You want to buy something? Thank you. Kill 15 boars and bring me their tusks. Collect some tigerfangs for a necklace."
Longer storyline dialogues where intonation and character is more important are still easier to finetune with voiceactor right now, but for short snippets it doesnt matter that much
>>
>>56510948
Richard Dawkins
>>
File: 1449284416702.jpg (150KB, 393x829px) Image search: [Google]
1449284416702.jpg
150KB, 393x829px
>>56504568
how does it feel that women that meet your requirements exist, (and somewhere out there there's the ideal girl AND she'll like you) but you'll never meet her
Thread posts: 161
Thread images: 16


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

If you need a post removed click on it's [Report] button and follow the instruction.
If you like this website please support us by donating with Bitcoin at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties. Posts and uploaded images are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that website. If you need information about a Poster - contact 4chan. This project is not affiliated in any way with 4chan.