Why have Intel's IPC gains stagnated so much since Sandy?

Thread replies: 16
Thread images: 2

Anonymous
2017-01-31 10:31:16 Post No. 58740484
[Report] Image search: [Google]

File: 1484704723906.jpg (353KB, 788x576px) Image search: [Google]

353KB, 788x576px

Anonymous 2017-01-31 10:31:16 Post No. 58740484 [Report]

Why have Intel's IPC gains stagnated so much since Sandy? Wasn't the jump from Westmere to Sandy 50?

Anonymous 2017-01-31 10:35:25 Post No.58740536
[Report]

Anonymous 2017-01-31 10:35:25 Post No.58740536 [Report]

How can IPC ever be more than 1? Genuinely asking. Utilizing pipelining, with X stages, you could only get 1 instruction done for every X cycles, right? And starting at the X+1th cycle, you can get 1 instruction per cycle, all assuming there are no data dependencies.

Anonymous 2017-01-31 10:37:06 Post No.58740556
[Report]

Anonymous 2017-01-31 10:37:06 Post No.58740556 [Report]

>>58740536
I'm assuming OP meant 50%

Anonymous 2017-01-31 10:38:17 Post No.58740576
[Report]

Anonymous 2017-01-31 10:38:17 Post No.58740576 [Report]

>>58740556
What? No, I'm talking about the raw number of instructions that are completed at the end of every cycle.

Anonymous 2017-01-31 10:42:05 Post No.58740640
[Report]

Anonymous 2017-01-31 10:42:05 Post No.58740640 [Report]

People will blame jews or AMD for not being completive, or the materials "reaching their limt" but the real reason is that the desktop market is shrinking, most consumers don't care about performance because what they have is good enough. Economically, it makes more sense to dump your R&D money into making your current performance tier use less energy because with the market going more and more mobile battery life becomes the most important decision for most consumers.

Anonymous 2017-01-31 10:46:02 Post No.58740696
[Report] Image search: [Google]

Anonymous 2017-01-31 10:46:02 Post No.58740696 [Report]

File: 1456681584195.png (223KB, 644x580px) Image search: [Google]

223KB, 644x580px

>>58740484
We can't go any wider without SEVERE diminishing returns. Even accounting for scaling issues MOAR CORES is a better use of the die space.

Anonymous 2017-01-31 10:46:41 Post No.58740708
[Report]

Anonymous 2017-01-31 10:46:41 Post No.58740708 [Report]

>>58740536

You run several calculations in parallel and keep the results in a buffer, if no conditional invalidates them you retire them.

Anonymous 2017-01-31 10:48:10 Post No.58740727
[Report]

Anonymous 2017-01-31 10:48:10 Post No.58740727 [Report]

>>58740708
But doesn't that require compilers to be good with register allocation? Seems like a stupid idea to put trust in software engineers.

Anonymous 2017-01-31 10:49:12 Post No.58740739
[Report]

Anonymous 2017-01-31 10:49:12 Post No.58740739 [Report]

>>58740536
Pipelining - every instruction consists of different steps, many of them can be executed simultaneously. Also out of order execution. If instructions don't depend on each other they can be executed in one cycle, on one core.

Anonymous 2017-01-31 10:51:41 Post No.58740776
[Report]

Anonymous 2017-01-31 10:51:41 Post No.58740776 [Report]

>>58740484
Trying to make a more efficient front end is the hiccup. Its the most involved part of a core, the most critical. Engineers have picked all the low hanging fruit for performance, so each percent uplift requires more manpower to be invested. Its no easy task.

Anonymous 2017-01-31 10:53:42 Post No.58740801
[Report]

Anonymous 2017-01-31 10:53:42 Post No.58740801 [Report]

>>58740727

No. the data in the separate pipelines are not kept in the actual main registers, those are only overwritten when the data in the buffer is retired.

All the tricks that go on in the CPU are not conditional on the compiler knowing that the CPU uses the tricks that it does, from the compiler's point of view there is only a single pipeline and the hardware designers guarantee that the calculations will be correct according to that. There is a boundary between what goes on at the hardware level and the software level that shouldn't be crossed.

Anonymous 2017-01-31 11:00:47 Post No.58740883
[Report]

Anonymous 2017-01-31 11:00:47 Post No.58740883 [Report]

>>58740801
Oh OK. I understand the whole buffered pipeline thing, I just don't understand how Intel can get IPC improvements that aren't whole numbers unless their IPC was already in the higher double digits or the hundreds. So like >>58740739 says, would IPC just be dependant on Number of Cores? So for example, a quad core using all cores would get 4 instructions per cycle? I'm sorry if I sound dumb, I'm just trying to wrap my head around it.
Currently reading https://en.m.wikipedia.org/wiki/Instructions_per_cycle

Anonymous 2017-01-31 11:05:07 Post No.58740953
[Report]

Anonymous 2017-01-31 11:05:07 Post No.58740953 [Report]

>>58740536
>How can IPC ever be more than 1?

By dispatching more than one instruction per clock.

Let's say you have 3 functional units in your CPU: Load/store, arithmetic, and floating-point.

Now, lets say that these 3 instructions are next:
load $2000, r0 ; load data from ram address 2000
add r1, r2 ; add ordinary 64-bit registers
fadd fr1, fr2 ; add 80-bit floating-point registers
The instruction dispatcher can dispatch all 3 of these instructions for execution at the same time, because they're all going to different functional units, which can operate in parallel.

Obviously the dispatcher can't always fill every slot on every clock. But even with only some parallelism, the average IPC can become greater than 1.

Note that the fetcher/dispatcher must be capable of simultaneously working on multiple instructions at the same time. But it's not hard to design them to operate on blocks of, say, 4 instructions at a time, scanning ahead for opportunities for parallel dispatch.

For better results, the compiler should reorder the instructions to make maximum use of the parallelism. For example, a data dependency like this can prevent parallel dispatch:
load $2000, r0
add r0, r1
In this case, these two can't be dispatched in parallel because the second one depends on the result of the first one. A smart compiler would find other instructions that can be moved up or down to fill the vacant dispatch slot(s) between them.

Anonymous 2017-01-31 11:05:34 Post No.58740956
[Report]

Anonymous 2017-01-31 11:05:34 Post No.58740956 [Report]

>>58740883

No, the non_whole number is the average amount of instructions per cycle, individual cycles will only have whole numbers as you correctly assumed. but if you have a sequence(from my ass) of 5,5,5,4 IPC you end up with an average of 19/4(4,75) IPC. So you can't say exactly how many instructions will be carried out in a specific clock but you can estimate the average for a well-written program.

Anonymous 2017-01-31 11:10:49 Post No.58741021
[Report]

Anonymous 2017-01-31 11:10:49 Post No.58741021 [Report]

>>58740953
Ahhh, I see now. Damn, that's ingenious, I really wouldn't have even thought of having separate units for different kinds of operations. Pretty cool stuff, thanks anon.

>>58740956
Thanks for the help dude, you've cleared up some things for me.

>go into a thread expecting to get ignored
>get good answers instead
Feels good man.

Anonymous 2017-01-31 11:13:35 Post No.58741054
[Report]

Anonymous 2017-01-31 11:13:35 Post No.58741054 [Report]

The core of their engineering department has been focused on making Bitcoin mining ASICs for the NSA to thwart the rise of the Chinese miners.

Basically the CPU division is now being staffed by interns and junior level engineers. They know most people just use their computers as entertainment devices so there's no need to make them any faster.

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible. Read more on this topic here - https://archived.moe/talk/thread/1694/

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/