GPUs & CPUs & Enthusiast hardware: Questions, Discussion and fanboy slap-fights - Nvidia & AMD & Intel - Separe but Equal. Intel rides in the back of the bus.

  • We are being DDoS attacked still and 12,000 people are reading about incest. Expect weird errors. Most should go away by refreshing. Emails (registration / password reset) appear to be working; be sure to check spam.

    THE MERGE IS ON.

Smaug's Smokey Hole

Sweeney did nothing wrong.
kiwifarms.net
There are shades of old Furmark when Amazon's MMO is allegedly frying 3090's. "players are theorizing that there's not an FPS cap on the menu screens, causing GPUs to render 9000+ FPS"
amazonmmo.JPG

amazonmmo1.JPG

 

The Mass Shooter Ron Soye

How you gonna explain fucking a man? 🤔
kiwifarms.net
There are shades of old Furmark when Amazon's MMO is allegedly frying 3090's. "players are theorizing that there's not an FPS cap on the menu screens, causing GPUs to render 9000+ FPS"
View attachment 2368296

View attachment 2368297

Obviously, Amazon should fix that, but Nvidia or its partners are ultimately responsible for the 3090 killing itself. They shouldn't have gone with that smoking hot GDDR6X VRAM and absurd TDPs.
 

Allakazam223

We wuz Orkz n shit
kiwifarms.net
Obviously, Amazon should fix that, but Nvidia or its partners are ultimately responsible for the 3090 killing itself. They shouldn't have gone with that smoking hot GDDR6X VRAM and absurd TDPs.
Kind of seems like NV went full Intel and just cranked power instead of actually going for IPC improvements or w.e the GPU equivelant is. I'm pretty certain that the multi chip solution will be the way forward for AMD. Wasn't the issue with Crossfire and SLI that the seperate cards couldn't communicate fast enough? If Infinity Fabric can be used for CPU's, why not GPU's?

With the speeds of GPU's fast approaching CPU levels, why is AMD able to keep somewhat of a reasonable level of TDP? Why does Nvidia's ùArch need so much more power than AMD's? Is the AI core that power hungry? Can someone point me in the right direction of where to find a comparison not of GPU's, but of ùArch differences?
 

The Mass Shooter Ron Soye

How you gonna explain fucking a man? 🤔
kiwifarms.net
Kind of seems like NV went full Intel and just cranked power instead of actually going for IPC improvements or w.e the GPU equivelant is. I'm pretty certain that the multi chip solution will be the way forward for AMD. Wasn't the issue with Crossfire and SLI that the seperate cards couldn't communicate fast enough? If Infinity Fabric can be used for CPU's, why not GPU's?

With the speeds of GPU's fast approaching CPU levels, why is AMD able to keep somewhat of a reasonable level of TDP? Why does Nvidia's ùArch need so much more power than AMD's? Is the AI core that power hungry? Can someone point me in the right direction of where to find a comparison not of GPU's, but of ùArch differences?
AMD's Infinity Cache was probably its best weapon this time around. It allowed AMD to use a smaller memory bus width and slower GDDR6 memory, but still reach Nvidia 3080/3090 rasterization performance level while consuming less energy. The amount of cache on the GPU is tailored to the targeted resolution.

Nvidia might have been able to use GDDR6 and more memory chips and gotten a better result than with GDDR6X. I'm not sure. They also could have used High Bandwidth Memory which is expensive but more efficient.

You can probably attribute some of the difference to TSMC 7nm being more power efficient than Samsung 8nm.

Nvidia and Intel will also use MCM. Maybe not for their next GPU launches, but soon.
 

Smaug's Smokey Hole

Sweeney did nothing wrong.
kiwifarms.net
Kind of seems like NV went full Intel and just cranked power instead of actually going for IPC improvements or w.e the GPU equivelant is. I'm pretty certain that the multi chip solution will be the way forward for AMD. Wasn't the issue with Crossfire and SLI that the seperate cards couldn't communicate fast enough? If Infinity Fabric can be used for CPU's, why not GPU's?
AMD is working on chiplet-esque design for future GPUs and I think the infinity cache is crucial to that.

Like you said, one problem with SLI was how fast they could share data and what data they had to share. In a raytracer, pathtracer or raycaster splitting the frame into segments and rendering it on different systems isn't an issue, but the kind of rasterization that realtime graphics are built around is hacks upon hacks upon hacks. One easy example would be reflections, in raytracing it's no problem if the reflected object is off-screen or being rendered in a different tile, but in rasterization and screenspace reflections(a speed hack) then something that isn't rendered can't be part of the reflection that you see. (I'm ignoring things like cube maps)

Now imagine SLI splitting the image in two, let's say a top-bottom approach. Let's say there's a puddle in an alley that stretches across both the top and bottom halves and there's a character up top. The character will be reflected in the upper half of the puddle but not in the lower half because that render context can't "see" the character. Problems like that need work arounds and game specific driver support, speed will ultimately be sacrificed to make it look right.

Internally GPUs already render in tiles so the rendering is split up in a SLI-like fashion, so it works, but the GPU have ultra-fast internal communication and shared memory unlike two discrete cards or a 2xGPU card with 2x sets of VRAM. A chiplet approach with a large infinity cache seems like it could solve the problems that plagued modern SLI.
 

Allakazam223

We wuz Orkz n shit
kiwifarms.net
AMD's Infinity Cache was probably its best weapon this time around. It allowed AMD to use a smaller memory bus width and slower GDDR6 memory, but still reach Nvidia 3080/3090 rasterization performance level while consuming less energy.

Internally GPUs already render in tiles so the rendering is split up in a SLI-like fashion, so it works, but the GPU have ultra-fast internal communication and shared memory unlike two discrete cards or a 2xGPU card with 2x sets of VRAM. A chiplet approach with a large infinity cache seems like it could solve the problems that plagued modern SLI.
I would love to see a Threadripper size MCGPU with HBM thrown as close to the dies as physically possible.

Inb4 someone links me a datacenter GPU that does this.
 

The Mass Shooter Ron Soye

How you gonna explain fucking a man? 🤔
kiwifarms.net
I would love to see a Threadripper size MCGPU with HBM thrown as close to the dies as physically possible.

Inb4 someone links me a datacenter GPU that does this.
 

Smaug's Smokey Hole

Sweeney did nothing wrong.
kiwifarms.net

Fake 16-core beating real 16-core in multi-threaded? Big if true. But it will be hot, and the performance will depend on using DDR5 or DDR4. It supports both but the cheaper systems are going to be on DDR4.

AMD will probably respond within a couple months by using 3D V-Cache.
That is pretty nuts if true, so it's probably not true.

I either don't remember what if I've read anything about this or if there's even any real information about how Intel's big-little design works, but I was always under the assumption that it was running either the big cores or the small cores, not both at the same time. Running both the high-performance cores and the low-performance cores at the same time would result in very different execution times on a shared workload and that seems like it would be an unpredictable nightmare for any scheduler to balance, but what do I know, it looks like they figured it out.

Knowing Intel the CPU will cost a billion bucks, have a different pinout than the rest of the lineup, be locked to a very expensive chipset that only supports like two different CPUs and it won't support the 13000 refresh. It would also be beneath Intel to do what AMD does and sell a 10% better CPU at the same price as the CPU it competes against.
 

The Mass Shooter Ron Soye

How you gonna explain fucking a man? 🤔
kiwifarms.net
I was always under the assumption that it was running either the big cores or the small cores, not both at the same time. Running both the high-performance cores and the low-performance cores at the same time would result in very different execution times on a shared workload and that seems like it would be an unpredictable nightmare for any scheduler to balance, but what do I know, it looks like they figured it out.
The TDP-constrained Lakefield did that in the early reviews. I don't know if they ever fixed its performance, but Alder Lake-M5 will be almost exactly the same: 1+4 cores, 48 or 64 graphics execution units, even lower 5W TDP but overlapping Lakefield. Probably no stacked DRAM this time. According to the chart, they want to put it in tablets.

As for the desktop chips with no TDP issues, we'll just have to see how the scheduling works out. It might be difficult to measure Alder Lake with traditional benchmarks. Ask yourself: what is the purpose of the big and small cores?

1. To run different tasks on the big or small cores. For example, a game on the big cores, streaming software on the small cores.
2. To maximize the multi-threaded performance without compromising single-threaded performance. Putting in 8 small cores instead of 2 more big cores maximizes performance per area and watt.

Knowing Intel the CPU will cost a billion bucks, have a different pinout than the rest of the lineup, be locked to a very expensive chipset that only supports like two different CPUs and it won't support the 13000 refresh. It would also be beneath Intel to do what AMD does and sell a 10% better CPU at the same price as the CPU it competes against.
My guess is $600 for the 12900K. The 11900K MSRP was $539. $600 puts it well under the Ryzen 9 5950X.

The 600-series chipset should support both Alder Lake and Raptor Lake. Raptor Lake is rumored to have better big cores and 16 of the small cores. If that's true, it's more than a refresh.
 

Smaug's Smokey Hole

Sweeney did nothing wrong.
kiwifarms.net
The EVGA cards burning themselves out in New World apparently tried to set the fan speed to 200,000RPM no matter what fan profile the user had set. EVGA is replacing every card and "New World" is a selectable choice in the RMA section of their site.

Other stuff, Intel is changing the naming scheme of their cursed process nodes.
intelnodes.jpg
 

The Mass Shooter Ron Soye

How you gonna explain fucking a man? 🤔
kiwifarms.net
They are making their marketing lies match TSMC and Samsung's marketing lies. Now they just have to get it out on time so that "Intel 4" is actually competing with "TSMC 4" and not "TSMC 3".

Some outlets think that Meteor Lake will also be on LGA 1700, after Alder Lake and Raptor Lake. Even if that is true, you will probably have to get a DDR5 motherboard if you want an upgrade path.
 

PuffyGroundCloud

Steak connoisseur (on diet)
kiwifarms.net
The EVGA cards burning themselves out in New World apparently tried to set the fan speed to 200,000RPM no matter what fan profile the user had set. EVGA is replacing every card and "New World" is a selectable choice in the RMA section of their site.

Other stuff, Intel is changing the naming scheme of their cursed process nodes.
View attachment 2383375
I won't be surprised if the reason is Intel can get away with Chip sizes without saying it
 

Smaug's Smokey Hole

Sweeney did nothing wrong.
kiwifarms.net
Kind of seems like NV went full Intel and just cranked power instead of actually going for IPC improvements or w.e the GPU equivelant is. I'm pretty certain that the multi chip solution will be the way forward for AMD. Wasn't the issue with Crossfire and SLI that the seperate cards couldn't communicate fast enough? If Infinity Fabric can be used for CPU's, why not GPU's?
Look at what's on the current rumor mill. A two die GPU(RDNA3, 7900XT).
AMD-Radeon-RX-7900-XT-Big-Navi-31-GPU-With-RDNA-3-Architecture-Block-Diagram.png
On the upper left the blocks are marked GCD0 and GCD1, connected by a 512MB infinity cache/infinity fabric(yellow in the middle).

 

Allakazam223

We wuz Orkz n shit
kiwifarms.net
Look at what's on the current rumor mill. A two die GPU(RDNA3, 7900XT).
View attachment 2388431
On the upper left the blocks are marked GCD0 and GCD1, connected by a 512MB infinity cache/infinity fabric(yellow in the middle).

So correct me if im wrong, but this resembles L3 cache for CPU's? Could they scale this up for 4 or more? What stops them from making a massive GPU MCM?
 

Smaug's Smokey Hole

Sweeney did nothing wrong.
kiwifarms.net
So correct me if im wrong, but this resembles L3 cache for CPU's? Could they scale this up for 4 or more? What stops them from making a massive GPU MCM?
Just speculation but they might have a certain amount of reserved space for each GCD and some amount shared between them so adding more GCDs require more cache.

The rumors of 512MB is already nuts and if they double the amount of GCDs to 4 would they have to double the infinity cache to 1024MB? The article also mentions that it could go down to 256MB in some other other configuration(perhaps a 7800), so maybe it's just using 2x the individual cache and the 7900XT uses 256MB per GCD. It will be very interesting to see.

Improving on RDNA2 and doubling up on the GPU will last them a while. There's probably someone entering puberty at the start of Corona and crypto-mania, sitting on a GTX 970 because nothing better could be bought for years. Then one day a Radeon 7800 is actually available for purchase and he get blasted into space playing 8K Fortnite at 300fps.
 

The Mass Shooter Ron Soye

How you gonna explain fucking a man? 🤔
kiwifarms.net
My uneducated guess is that the larger than expected Infinity Cache is to compensate for inefficiencies or latency introduced by the MCM design. It could also make it a true and honest 8K GPU, as if anybody cares.

Doubling it again on the same node would result in excessive power consumption and cost for a consumer card. They need to move to 3nm or 2nm first. Intel's first quad-tile GPUs for high performance computing could use up to 500 Watts. There will be 8-tile designs later.
 

Allakazam223

We wuz Orkz n shit
kiwifarms.net
AFAIK, this is what killed SLI/Crossfire. The GPU's couldn't communicate fast enough to actually work together. Work arounds included but not limited to; alternating frame outputs (GPU 1 draws odd frames, GPU draws even frames) or each card processing half of the screen vertically.

So, definately something to do with overcoming that. Wonder if AMD is planning on using that to syncronize schedulers. If it works more like Infinity Fabric, but is based off of VRAM speeds, wouldn't that segment theoretically be able to run at half the VRAM speed? Zen seems to be able to run IF dependably up to 1800 for most people easily. Would 512mb@1800mhz be fast enough and be able to process enough info for two Navi dies? Or would it be syncronized to the GPU clocks? Oh shit. IF already runs as fast as gen 1 radeon gna cores. Just make a Threadripper-size GPU die package.

I am kinda pumped about this, if you couldn't tell, even though I am a exceptional individual who doesn't really understand a lot any computing. The hardware is fascinating.
 

The Mass Shooter Ron Soye

How you gonna explain fucking a man? 🤔
kiwifarms.net
I am kinda pumped about this, if you couldn't tell, even though I am a exceptional individual who doesn't really understand a lot any computing. The hardware is fascinating.
I think it's great just from a standpoint of getting higher yields and potentially lower cost than the giant dies normally seen in high-end GPUs. RTX 3080 and up use a 628.4 mm² die for example. Except that it sounds like the RX 7900 XT will put 2 relatively massive dies together and will cost a fortune.

You should check out this channel, you might like it.
 
Top