So I created 1d grids of floats and I'm adding their values to values of their direct neighbors. I created the code where threads have shared memory, CGMA increased but computation time increased compared to the version with no shared memory. Can you explain this? My Occupancy is near 100%.