Hi there. I am currently working on backpropagation with CUDA acceleration. Does anybody have any idea if I call a thread for each dataset that will do the backprop for the entire network or if I then split each dataset up into other threads? The number of threads would then have to vary between layers and the kernel would be have to be called multiple times and would be very costly.
>>55746473
Are you using DIGITS?
I Don't have an answer for you but it may help you find one if you give a bit more details on your set up.
Wait are you working on CNNs?
>>55748288
I am using CNNs, training with stochastic gradient descent back propagation. I am not using DIGITS, I am developing our own soloution. It works great at this moment, but runs only on CPU.
I have working GPU code, but I am unsure how I would want this to work. I have looked at articles about parallelized back propagation with GPUs, but no specifics on how they are implemented. Wondering if a GPU thread for each dataset is the soloution that would scale the best.
Since the variable number of neurons per layer, I can't really have a thread for each neuron and dataset, and they would also have to be synchronized since each layers output depends on the previous layer.
I've never used cuda, but there is a very simple way to tell if the overhead from communication is too big
Do it without parallel processing then do it with, it if takes longer parallel you fucked up
>>55748665
Yes, but I would have to have a huge ass dataset for this to pay off. I have one @ ~1.2M, maybe I should look at that one.