World's Best Technology Blog

This website gives you great knowledge about technology and keeps you up to date with Mobiles, Tabs, PC, Gadgets, OS, Gaming, Apps, Web, Social Media, Deals, Blogs etc.

What are CUDA Cores? | Compute Unified Device Architecture- Explained

If you are curious to know What are CUDA Cores? then you are in the correct place. CUDA stands for Compute Unified Device Architecture- Fully Explained Here

NVIDIA is revolutionizing the world by there super powerful Graphics Technology. In the field of new Graphics innovations, this company is the winner of the game. CUDA Cores are one of the great inventions by NVIDIA(The leading company in Graphics Technology).

Now please pay special attention to this one-

CUDA stands for compute unified device architecture.

It is the term dubbed by Nvidia that mirrors to extent stream processors on the AMD side. In essence, both describe exactly what they do:- "Compute and Stream graphical data."


Short History of  CUDA cores-

Now, if you go back and think about where we started-

The GTC's first year was 2009. They introduced Tesla. They invented CUDA in the year 2007.

I think that the GPU many of you guys will probably still remember the GeForce 8800 GTX.

8800 is potentially one of the most important GPUs ever created, and it was a sacrifice they made early on.

It put CUDA on every single GPU long before people found value at the value in it by making it available on every single GPU.

From desktops to laptops to supercomputers to data centers and now in mobile devices and now in your cars.


By putting CUDA in every single GPU, they make it as easy as possible for us to develop and to deploy our software.

But there's much more to it than that as we'll discuss here shortly.

Also, Read-

1. What is Windows?
2. What is Wi-Fi?

First off let's address a common misconception about NVIDIA CUDA -

"An Nvidia graphics card with a higher number of CUDA cores than another must be more powerful."

........This is only true to the extent that the extent being cards within the same family.

For instance, the GTX 960 packs 1,024 maxwell cuda cores, while the more powerful GTX 970 contains 1664 of them.

The 970 is more powerful, and both are based on the same architecture, so this makes sense.

VRAM availability and clock speed also play a role here, but for the most part, core counts within the same family will indicate relative GPU strength.

The same generalization can be made for stream processors and AMD GPU labs, by the way.

What about different architectures?-

So what about across different architectures say from Maxwell to Pascal. Here things get dicey!

The 980Ti one of Maxwell's biggest and worst contains twenty-eight hundred and sixteen(2816) NVIDIA CUDA cores, while the Pascal 1080 features only two thousand five hundred and sixty(2560) of them.

However, thanks to narrower fin arrays of Pascal transistors and a much more compact profile overall, the Kennedy is the clear winner.

When it comes to gaming performance, there are advantages to having more physical cores such as video rendering.

But these can be eliminated with simple overclocks and Driver optimizations.

Now when I wrote a compact profile, I was referring to the fabrication node.

Pascal features 16-nanometer architecture meaning that individual features within each Pascal die can be precisely defined to within 16 nanometers.

More on that in an upcoming into

Maxwell architecture, by contrast, is based on a 28-nanometer fabrication.

So by principle of design, theoretically, more transistors can be packed in each cuda core within Pascal GPUs.

.....This plays a substantial role in single-core performance GPU die size and overall power consumption, all three of which go hand in hand.

You see, when it comes to single-core performance, it's undeniable individual Pascal cores are more efficient and more powerful than their Maxwell counterparts.

But while this would normally be thanks to the increased number of transistors per CUDA core, which is actually not the case for the 1080 compared to the 980Ti.

In the case of what you're about to see, this comes down to purely clock speed and GPU dies size.

Let's break down a 980Ti and a 1080. The 980Ti contains 8.1 billion transistors with two thousand eight hundred and sixteen cores.

The 1080 contains 7.2 billion transistors with 2560 cores.


Assuming a uniform distribution of transistors per core. We get into some heavy technical topics if we assume anything else.

....This puts the Ti at roughly 2 million 876 thousand 420(2,876,420) transistors per core and the 1080 at two million eight hundred and twelve thousand five hundred(2,812,500) transistors per cuda core.

Now at this point, if you just blindly judge the two cards in question based on the numbers you just read, I wouldn't really blame you, but you would be incorrect(cute GPU frequencies).

Frequency Matters in CUDA-

The 980Ti can overclock up to around 1500 megahertz with a non-reference cool, but in MSI gold edition, overclocks to a stable 1531.

But a typical GTX 1080 can attain a stable 2,000 mega Hertz, and that's an easy 2000 to obtain 500 Meg's over the Ti.

So while transistor counts per core and per GPU might not be far apart.

Pascal transistors can attain higher overhead frequencies thanks to their reduced sizes.

Picture the transistors like Pistons in a car. Smaller pistons are lighter and typically travel shorter distances per stroke, meaning that our RPM, the equivalent of frequency, in this case, is generally higher.

Larger Pistons are heavier and usually travel larger distances from top dead center to bottom, resulting in reduced rotations per minute under full load.

It's why a 1.6 liter Formula one engine revs to well over 12,000 rpm while an eight-point four(8.4) liter Viper engine is limited to around six thousand.

The same principle, in theory, applies for transistors.

In general, smaller transistors consume less power and demand lower voltages overall, resulting in higher overclocking Headroom, and this is exactly what Nvidia Bank has done with their GP 104 lineup.

On paper o'clock at four o'clock, the 980Ti in 1080 are neck-and-neck. If we reduce the frequency up to 1080 in the core to that of the 980Ti, both cards will perform similarly omitting memory frequency differences between G5 and G5x.

However, thanks to a much smaller fabrication and significantly smaller GPU die overall. We're talking 314 square millimeters verse 601.

The 1080 crushes its current competition while conserving power and venturing into uncharted GPU frequency territory.

Architecture Lee had spoked in GTC that CUDA cores are grouped into chunks regarded as streaming multiprocessors.

These assume maximize throughput by organizing clusters of information into streams for the sake of parallel processing.

CUDA cores within each SM essentially divide instruction sets into even distributions of in the cases of Maxwell and Pascal, 128 and sixty-four, respectively.

Nvidia decided to reduce the number of CUDA cores in each streaming multiprocessor by half in an effort to increase the ability of Pascal GPUs to render and shade simultaneously,

And for those of you who are especially astute in this topic, this brings us awfully close to the issue of asynchronous computing again saved for another crash course.

My Final Words(Conclusion)-

However, what you need to know for now is that cuda has and likely will be the preferred architectural design for NVIDIA GPUs.

It has been since 2006 and based on the current trend. It's very unlikely that CUDA will ever fully support asynchronous compute at a hardware level, given this current configuration.

It just doesn't add up from what I'm seeing so far, and several articles agree with me here (Just Search them on Google).

It is clear that the invidious goal in reducing cores per SM was to increase throughput via greater registry access.

However, from the looks of things at this point, I have my doubts as to whether or not this current model trend will align with that of the gaming industry were aimed he has a clear edge in Vulkan and DirectX12 titles.

We will discuss them in some other article!

Thanks for learning with 

Post a Comment


By Satya Gupta


Contact Form


Email *

Message *

Theme images by ElementalImaging. Powered by Blogger.
Javascript DisablePlease Enable Javascript To See All Widget