[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Why have Intel's IPC gains stagnated so much since Sandy?

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 16
Thread images: 2

File: 1484704723906.jpg (353KB, 788x576px) Image search: [Google]
1484704723906.jpg
353KB, 788x576px
Why have Intel's IPC gains stagnated so much since Sandy? Wasn't the jump from Westmere to Sandy 50?
>>
How can IPC ever be more than 1? Genuinely asking. Utilizing pipelining, with X stages, you could only get 1 instruction done for every X cycles, right? And starting at the X+1th cycle, you can get 1 instruction per cycle, all assuming there are no data dependencies.
>>
>>58740536
I'm assuming OP meant 50%
>>
>>58740556
What? No, I'm talking about the raw number of instructions that are completed at the end of every cycle.
>>
People will blame jews or AMD for not being completive, or the materials "reaching their limt" but the real reason is that the desktop market is shrinking, most consumers don't care about performance because what they have is good enough. Economically, it makes more sense to dump your R&D money into making your current performance tier use less energy because with the market going more and more mobile battery life becomes the most important decision for most consumers.
>>
File: 1456681584195.png (223KB, 644x580px) Image search: [Google]
1456681584195.png
223KB, 644x580px
>>58740484
We can't go any wider without SEVERE diminishing returns. Even accounting for scaling issues MOAR CORES is a better use of the die space.
>>
>>58740536

You run several calculations in parallel and keep the results in a buffer, if no conditional invalidates them you retire them.
>>
>>58740708
But doesn't that require compilers to be good with register allocation? Seems like a stupid idea to put trust in software engineers.
>>
>>58740536
Pipelining - every instruction consists of different steps, many of them can be executed simultaneously. Also out of order execution. If instructions don't depend on each other they can be executed in one cycle, on one core.
>>
>>58740484
Trying to make a more efficient front end is the hiccup. Its the most involved part of a core, the most critical. Engineers have picked all the low hanging fruit for performance, so each percent uplift requires more manpower to be invested. Its no easy task.
>>
>>58740727

No. the data in the separate pipelines are not kept in the actual main registers, those are only overwritten when the data in the buffer is retired.

All the tricks that go on in the CPU are not conditional on the compiler knowing that the CPU uses the tricks that it does, from the compiler's point of view there is only a single pipeline and the hardware designers guarantee that the calculations will be correct according to that. There is a boundary between what goes on at the hardware level and the software level that shouldn't be crossed.
>>
>>58740801
Oh OK. I understand the whole buffered pipeline thing, I just don't understand how Intel can get IPC improvements that aren't whole numbers unless their IPC was already in the higher double digits or the hundreds. So like >>58740739 says, would IPC just be dependant on Number of Cores? So for example, a quad core using all cores would get 4 instructions per cycle? I'm sorry if I sound dumb, I'm just trying to wrap my head around it.
Currently reading https://en.m.wikipedia.org/wiki/Instructions_per_cycle
>>
>>58740536
>How can IPC ever be more than 1?

By dispatching more than one instruction per clock.

Let's say you have 3 functional units in your CPU: Load/store, arithmetic, and floating-point.

Now, lets say that these 3 instructions are next:

load $2000, r0 ; load data from ram address 2000
add r1, r2 ; add ordinary 64-bit registers
fadd fr1, fr2 ; add 80-bit floating-point registers


The instruction dispatcher can dispatch all 3 of these instructions for execution at the same time, because they're all going to different functional units, which can operate in parallel.

Obviously the dispatcher can't always fill every slot on every clock. But even with only some parallelism, the average IPC can become greater than 1.

Note that the fetcher/dispatcher must be capable of simultaneously working on multiple instructions at the same time. But it's not hard to design them to operate on blocks of, say, 4 instructions at a time, scanning ahead for opportunities for parallel dispatch.

For better results, the compiler should reorder the instructions to make maximum use of the parallelism. For example, a data dependency like this can prevent parallel dispatch:

load $2000, r0
add r0, r1


In this case, these two can't be dispatched in parallel because the second one depends on the result of the first one. A smart compiler would find other instructions that can be moved up or down to fill the vacant dispatch slot(s) between them.
>>
>>58740883

No, the non_whole number is the average amount of instructions per cycle, individual cycles will only have whole numbers as you correctly assumed. but if you have a sequence(from my ass) of 5,5,5,4 IPC you end up with an average of 19/4(4,75) IPC. So you can't say exactly how many instructions will be carried out in a specific clock but you can estimate the average for a well-written program.
>>
>>58740953
Ahhh, I see now. Damn, that's ingenious, I really wouldn't have even thought of having separate units for different kinds of operations. Pretty cool stuff, thanks anon.

>>58740956
Thanks for the help dude, you've cleared up some things for me.

>go into a thread expecting to get ignored
>get good answers instead
Feels good man.
>>
The core of their engineering department has been focused on making Bitcoin mining ASICs for the NSA to thwart the rise of the Chinese miners.

Basically the CPU division is now being staffed by interns and junior level engineers. They know most people just use their computers as entertainment devices so there's no need to make them any faster.
Thread posts: 16
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.