So how does hyperthreading actually work? Because it seems like some people (parroting information about presumably the first implementation) claim that each core remains entirely single-threaded and it's solely a trick on OS schedulers (and the pipelining/prefetching circuitry), while other people claim that the majority of each core is duplicated with only the largest circuitry blocks shared between the logical cores?
@nytpu sometimes when a core is executing code, it has to wait around for something to finish (for example, a cache miss, a mispredicted branch, division, etc) and cant find anything else to do. without hyperthreading the core is forced to wait doing nothing, but with hyperthreading the core can continue executing code from another thread, making better use of a cores resources
@unnick Okay I guess I worded my OP poorly because I know the premise of why it exists, I was just wondering the extent it affects the hardware itself. Like, I know the premise (especially originally) was solely to trick the OS into scheduling two "parallel" processes, so the CPU could instantly context switch anytime there's a pipeline stall instead of having to just wait.
And someone else confirmed that they duplicate a lot of the most common units like integer ALUs so both can use it in parallel; and that each thread can use any non-duplicated functional unit not used by the other (so e.g. one could be accessing memory while the other is using the FPU)
@nytpu oh
idk that much about the hardware part, but at least on zen, it seems like very little is statically split between the threads (just some queues like the micro op queue and retirement queue), and everything else is shared. all the functional units can be used by either thread
And someone else confirmed that they duplicate a lot of the most common units like integer ALUs so both can use it in parallel; and that each thread can use any non-duplicated functional unit not used by the other (so e.g. one could be accessing memory while the other is using the FPU)
Are you sure? I thought that was CMT (the thing Bulldozer had, where AMD got sued for misleading core count advertising) rather than SMT/hyperthreading.
Bulldozer had a seperate integer ALU for each “core” (CMT thread), but a shared FPU for each CMT cluster (pair of “cores”/CMT threads).
Although POWER8 (which has 8-way SMT) and POWER9 (which has either 4-way or 8-way SMT) apparently have varied amounts of duplicated units (and I don’t quite understand the difference between two SMT4 POWER9 cores and a single SMT8 POWER9 core, since they seem to have the same amount of slices).
@nytpu Think of it like hardware accelerated preemptive multitasking, I don’t know how much actual extra hardware there is, but microcode or hardware- the CPU task switches for you without all the work of jumping into kernel mode and saving and restoring by hand.
@penny Okay I guess I worded my OP wrong because I know the premise of why it exists, I was just wondering the extent it affects the hardware itself. Like, I know the premise (especially originally) was solely to trick the OS into scheduling two "parallel" processes, so the CPU could instantly context switch anytime there's a pipeline stall instead of having to just wait.
And someone else confirmed that they duplicated a lot of the most common units like integer ALUs so both can use it in parallel; and that each thread can use any functional unit not used by the other (so e.g. one could be accessing memory while the other is using the FPU)
@nytpu ah I see, I never thought about the actual hardware benefits because we save so much time just not trampolining through the kernel