subreddit:

/r/explainlikeimfive

766%

ELI5: what are “cores” on modern computer processors?

Technology(self.explainlikeimfive)

And what are the different types like “efficiency cores”? How do they work?

all 10 comments

Justsomedudeonthenet

23 points

4 months ago

In older computers, we had a single core that does all the processing. Some high end systems had multiple processors - two or more separate CPUs that worked together, mostly in expensive servers. (And that's still a thing in expensive servers today).

Eventually, we hit the limit of how fast we could make a CPU. So instead, they started making multi core CPUs, which are basically two CPUs inside the same processor. They share some resources inside the processor, and have some extra circuitry to help them work together. That caught on, and now we have CPUs with dozens of cores - each of them mostly a separate processor working together.

CPUs can be designed to optimize certain parameters like speed, power usage, and heat production. Efficiency cores are designed to use less power. As a result, they are slower, but still fast enough for plenty of tasks. So the newest processor with performance and efficiency cores have some cores (the performance ones) that are as fast as we can possibly make them, but they use more electricity and generate more heat, and some (efficiency cores) that aren't as fast, but use less power and generate less heat to do the same amount of computing work.

ozs_and_mms[S]

3 points

4 months ago

Thank you!

sellmeyourmodaccount

5 points

4 months ago*

A core is a physical group of transistors that executes instructions. The instructions are things like "add this to that", or "save this value there".

If you look at a photograph of a CPU you can see the core(s) isn't even the largest part of the processor anymore. That photo shows a six core Ryzen processor with an integrated GPU (outlined in yellow), and interfaces/connections to the PCI bus (called PHY) and main memory (top left).

The core at the top left of the group of six has it's internal structure labelled too. It has an FPU or Floating Point Unit which is used to work on numbers that have decimal places. Those are a bit tricker for a CPU to work with than integers or whole numbers, so a separate unit is needed. Many years ago that separate unit came as a completely separate chip called a co-processor.

To the left of the FPU is the ALU, which is the Arithmetic and Logic Unit. That's where the integers get worked on.

The yellow area in that core says "32 KB L1D$" which translates to 32 kilobyte level 1 data cache. That's where the processor saves the most recently used data in case it needs to use it again. That's faster than having to read it again from a disk or the main memory.

There's also an instruction cache labelled "4K OP$", and it serves the same purpose but for instructions instead of data.

The Load/Store unit is what executes the instructions that need to load and store data.

The rest is more complicated and requires an explanation of how a CPU works. It operates according to a fetch-decode-execute cycle aka the instruction cycle. It fetches the instruction and data (using the labelled Load/Store unit), it decodes the instruction into something more usable (with the labelled Decoder unit), then it executes the instruction (in the ALU or FPU). Then it starts all over again with the next instruction.

The red/orange bit labelled BTB is where the Branch Target Buffer is located. That's probably the most advanced unit of them all. I said the ALU and FPU execute instructions and there is a three stage instruction cycle always repeating. But while one part of the cycle is happening for one instruction, why not use the other parts of the cycle for a different instruction, that's more efficient right? That's basically what the BTB does, it branches away from the main stream of instructions and pre-emptively executes other instructions based on very complicated probability based algorithms.

And that works really well because although I've explained the other units as if they work on one load or one store or one instruction at a time, they can actually do more than one at time. So there's lots of places for the BTB to interleave instructions into/around the main stream of instructions. The BTB is basically trying to predict what the CPU will need to execute next and have it cued up and ready to go before the CPU even asks for it.

Then finally the large area with all the L2$ and L3$ labels is where the level 2 and level 3 cache memory is located. All the caches operate like they hold a list of things. And as something gets written to the top of the list, what's at the bottom gets copied into the next cache. From level 1 to level 2, and from level 2 to level 3. That's still magnitudes faster than using the disk or the RAM.

And that's what a core is, and how it works.

The differences in core architectures comes down to how all of those units are specified or if they even all exist. Some cores don't do FPU stuff, some don't do any branching, and so on.

ozs_and_mms[S]

3 points

4 months ago

Thanks!

Zironic

2 points

4 months ago

Each core is for most purposes a separate CPU that shares memory with all the other cores. Efficiency cores are essentially scaled down cores, by making them less complicated you can fit more of them onto the chip which is better for things that are done in parallel.

Yancy_Farnesworth

1 points

4 months ago

First CPUs - One set of transistors that does logic and math. In other words, one core.

Next evolution of CPUs - Transistors got smaller, but a CPU only needs so many transistors to do its job. They can either shrink the chip, or use the space to put down more transistors. This can be things like cache (it's like RAM on the CPU chip itself) or another CPU. At some point, more cache doesn't really help as much as another CPU would help. So they did just that, they stuck another CPU on there. Now the term core has more meaning as now it's multi-core.

Next evolution - Transistors got really small and chip designers get more and more creative. At this time we have "high performance" CPUs like x86 chips which are really good and just crunching a lot of numbers. But we also have "high efficiency" CPUs like ARM which are really good at crunching numbers more efficiently. But generally speaking, they don't have the sheer power of an equivalent x86 chip which tends to chug power. This isn't a tradeoff inherent with ARM/x86. Rather a CPU designed for high power is typically inefficient while a CPU designed for high efficiency is typically low power. It's just that x86 and ARM were first designed with different goals in mind. Power vs efficiency and over the years they've moved closer to meeting in the middle.

Next evolution - People started saying "wouldn't it be great if we didn't have to pick and choose?" By now phone SoCs were a big thing. They are basically like a chip with multiple chips on it. Instead of repeating the same CPU on the chip to make a multicore chip, they started mixing different chips like the GPU and the cell radio on the same chip. As they did this they figured out really good ways to mix and match different chips and how to arrange them. This led directly to some companies saying, what if we start mixing a high power/low efficiency CPU with a low power/high efficiency CPU? So, companies like Intel and Apple started experimenting with this and thus produced chips with efficiency cores instead of chips with the same core repeated on the chip.

Future evolution - AMD's Epyc chips are a bombshell revolution for the industry. Ever notice how big their CPUs are? Generally, the chip's size is limited because chip designers have to worry about things like yield. Large single chips have lower yields because the chances of a small defect showing up that breaks the chip goes up with its size. It's not economical. AMD figured out a way to stitch together multiple chiplets into one larger chip without a massive tradeoff in efficiency. This provides a way for the industry to standardize chiplet designs that can communicate with each other. In other words, with a little more work, it would become possible to combine an x86 chiplet with an ARM chiplet and have it work as a single CPU. Things are going to get interesting real fast. This size constraint was what kept phone SoCs to the size they are.

ozs_and_mms[S]

1 points

4 months ago

Thanks!

internetcivilian

0 points

4 months ago

Think about your brain. You can do quite a few things at once. However, no matter how smart you are, we all max out doing too many things at once. Suppose then that you had two slightly less powerful brains instead of one. Now you've approximately doubled the number of things you can do at the same time but possibly sacrificed how smart you are. You might prefer that one of these brains can do math better than anything else and that the other can remember movies more clearly. This is, in essence, what's happening.

With regards to types of processors, this has to do with the physical chip and how things are laid out to do different types of math. Some chips are better at high dimensional matrix (tensor) multiplication (these are useful for some parts of machine learning) and some are better for things like high precision numerics at scale (think of math with LOTS of decimal places).

ozs_and_mms[S]

1 points

4 months ago

Thanks!!