PS4Pro Can Run At 8.4tf (Eurogamer)

GribbleGrunger · Oct 20, 2016, 10:35 PM

Quote"One of the features appearing for the first time is the handling of 16-bit variables - it's possible to perform two 16-bit operations at a time instead of one 32-bit operation," he says, confirming what we learned during our visit to VooFoo Studios to check out Mantis Burn Racing. "In other words, at full floats, we have 4.2 teraflops. With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance."

Much more here:

Inside PlayStation 4 Pro: How Sony made the first 4K games console • Eurogamer.net

Raven · Oct 20, 2016, 10:37 PM

I don't even know what's real anymore.

Legend · Oct 20, 2016, 10:38 PM

This is apparently above my skill grade.

Half floats are pretty common so I would have just assumed most computers could do this.

Aura7541 · Oct 20, 2016, 10:39 PM

First time I have heard of this.

Raven · Oct 20, 2016, 10:41 PM

Someone want to explain to me what this means? Is this saying that in a situation where 32-bit operation isn't necessary, the system can pump double the performance? How often are 16-bit operations done today? I don't mean to be a buzzkill, but this kinda sounds like something a fanboy would tout as the reason why his system is actually superior.

the-pi-guy · Oct 20, 2016, 10:45 PM

Quote from: Legend on Oct 20, 2016, 10:38 PMThis is apparently above my skill grade.
Half floats are pretty common so I would have just assumed most computers could do this.

It sounds like there's better support.

QuoteTo date, with the AMD architectures, a half-float would take the same internal space as a full 32-bit float. There hasn't been much advantage to using them. With Polaris though, it's possible to place two half-floats side by side in a register, which means if you're willing to mark which variables in a shader program are fine with 16-bits of storage, you can use twice as many. Annotate your shader program, say which variables are 16-bit, then you'll use fewer vector registers."

Legend · Oct 20, 2016, 10:46 PM

Quote from: Raven on Oct 20, 2016, 10:41 PMSomeone want to explain to me what this means? Is this saying that in a situation where 32-bit operation isn't necessary, the system can pump double the performance? How often are 16-bit operations done today? I don't mean to be a buzzkill, but this kinda sounds like something a fanboy would tout as the reason why his system is actually superior.

Flops is shorthand for floating point operations per second.

Basically Neo can do 4.2 million million floating point operations per second OR it can do 8.4 million million half-floating point operations per second. Theoretically if an engine was optimized for this and wasn't beforehand, yeah you'd be getting a significant increase.

The drawback is that half-floats have half the precision so you can't use them for everything. Otherwise it would be a full 2X performance change.

the-pi-guy · Oct 20, 2016, 10:51 PM

Quote from: Raven on Oct 20, 2016, 10:41 PMSomeone want to explain to me what this means? Is this saying that in a situation where 32-bit operation isn't necessary, the system can pump double the performance? How often are 16-bit operations done today? I don't mean to be a buzzkill, but this kinda sounds like something a fanboy would tout as the reason why his system is actually superior.

In a really dumb way:

Basically, if you wanted to add 2 sets of numbers together, a+b, c+d. It sounds like the older systems were set up to use the entire 32 bit space.
Instead of having you can basically add them up if you have half the precision in half the time.
You can do this:
00110 01110
+00100 +10000

Instead of this:
0000000110 0000001110
+0000000100 +000010000

Those numbers aren't exactly right, but take them as illustration.

darkknightkryta · Oct 21, 2016, 12:47 AM

Quote from: Raven on Oct 20, 2016, 10:37 PMI don't even know what's real anymore.

You said a pun and you didn't even realize it.

kitler53 · Oct 21, 2016, 12:49 AM

all i hear is "secret sauce".

the-pi-guy · Oct 21, 2016, 05:45 PM

Quote from: kitler53 on Oct 21, 2016, 12:49 AMall i hear is "secret sauce".

It's less secret saucy than most secret sauces.
It works, just not always helpful.

ethomaz · Oct 21, 2016, 05:47 PM

Quote from: Legend on Oct 20, 2016, 10:38 PMThis is apparently above my skill grade.

Half floats are pretty common so I would have just assumed most computers could do this.

They do at the same speed or slower than F32.

Actually only Pascal P100 has F16 twice faster than F32 while looks like Vega will have that to AMD side.

Picture always explain better

Pascal's Architecture: What Follows Maxwell - The NVIDIA GeForce GTX 1080 & GTX 1070 Founders Editions Review: Kicking Off the FinFET Generation

GP104 runs FP16 at 1/64 FP32 and FP64 at 1/32 FP32... it is fudgy slow.
GP100 runs FP12 at 2x FP32 and FP64 at 1/2 FP32... it is fudgy fast.

ethomaz · Oct 21, 2016, 06:10 PM

This part explain better the FP16 on Pascal.

FP16 Throughput on GP104: Good for Compatibility (and Not Much Else) - The NVIDIA GeForce GTX 1080 & GTX 1070 Founders Editions Review: Kicking Off the FinFET Generation