18 comments

  • vxNsr 11 days ago

    Sooo... is intel like crying in a corner right now? On one side we have AMD eating their lunch in the consumer space, they still haven’t launched a full gamut of 10nm CPUs. Apple just announced that they’re dropping them in basically the next 5 years. And now ARM really is encroaching on their core server business.

    I feel like in 20 years from now we’re gonna be using intel as a cautionary tale of hubris and mismanagement. Or whatever it is that caused them to fail so spectacularly.

    • raxxorrax 11 days ago

      Honestly I think the proclaimed death of intel is vastly exaggerated. AMD came back from worse places and they do still have the manufacturing edge. Intel CPU for desktop still use less power, which is a big plus. How many people do you know that bought the fastest CPU available recently? Glad AMD is back on track, they were in a rough place, far worse than intels current situation.

      • blattimwind 11 days ago

        For what it's worth, Intel is still faster in most applications, simply by virtue of having a clock speed advantage that by far exceeds any IPC difference, and also by having much lower memory latencies. AMD has basically a 20-30 ns extra latency over Intel; so with good memory you can do ~45 ns on current Intels, but that will give you ~65 ns on a Ryzen. That's significant for a lot of code (e.g. pointer chasing, complex logic etc.).

        On the other hand, few applications scale efficiently to more than just four cores. Yes, of course, AMD delivers more Cinebenchpoints-per-Dollar and usually more Cinebenchpoints overall, but that's not necessarily an interesting metric.

        Personally I find that if I'm waiting on something to complete that the application in question tends to use only a tiny number of cores for the task at hand. Usually one.

        Another significant weakness of AMD's current platform is idle power consumption.

        These factors leave me with a much more nuanced impression than "Intel is ded" or "HOW IS INTEL GOING TO CATCH UP TO THIS????"; CPU reviews these days are just pure clickbait.

        • jchw 11 days ago

          The problem is a lot of tasks that people want their CPU to be fast at is exactly stuff that parallelizes almost embarrassingly well. Compiling code, video rendering, compressing files. People buying CPUs for this are not as concerned about how many cycles it takes to jump through a vtable as long as its not slow.

          Meanwhile, pointing at memory latency as the flaw in Ryzen has been a popular misdirection for a while now. People warned me about it being a performance pitfall since before I bought my first Ryzen processor. In practice it doesn’t show up in even the most complexity intensive workloads as a serious issue. For example, Zen 2 performs very well on hardware emulation. This is possibly because where it takes a hit in memory latency it makes up in caching and prefetching, but honestly I don’t know and I am not sure how to measure. In any case it’s certainly favorably comparable to Intel’s best chipsets in single core workloads even if not on top. Factor in price and multicore workloads and you now have the exact reasons why people like me have been singing the praises... Intel’s single core lead may exist in some form but it is not what it once was, it is not an unconditional lead where an Intel core beats an AMD core. Not even close.

          None of this means Intel’s dead of course, but IMO thats mostly because they have a lot more going on than just being the best CPU. They’ve got their dedicated GPU coming out, and plenty of ancillary technology as well. It does seem like for a company like Intel having to take a backseat in CPUs for a while will be painful; unlike AMD, this is a new position for Intel and maybe not one they will handle well.

          • lend000 11 days ago

            You can get an idea of how popular different processors are in the server space by looking at the AWS EC2 spot market. Top end Xeon server processors (C5 and Z1d) typically have much lower spot discounts than AMD EPYC based processors (r5ad), although ARM c6g instances have been pushed up in price significantly over the last few months, perhaps as people switch over to them for the per-computational-unit cost savings.

            Of course, this is all a factor of Amazon's supply of instances and their chosen on-demand pricing level, but the trends are certainly interesting, and show steady demand for fast Xeon's and increasing demand for ARM's. I have run some compute heavy workloads on the best AMD's I could find on AWS and the speed difference per core for my particular workload was nearly 50%, which got worse as it scaled up to bigger instances because my workload uses a lot of L3 cache. I hear about EPYC's with 256MB of L3 cache but I can't seem to find those on AWS -- only ones with 8MB of cache.

            • _msw_ 11 days ago

              Disclosure: I work at AWS on building cloud infrastructure

              C6g instances only launched on June 11. I'm not sure what information can be gleaned from the spot prices regarding Arm demand at this time.

              The C5a instances powered by AMD Rome processors have 192 MiB of L3 cache per socket total (16 MiB L3 slice per compute complex, 12 CCX per socket). You can observe this from the cpuid(1) output:

                 L3 cache information (0x80000006/edx):
                    line size (bytes)     = 0x40 (64)
                    lines per tag         = 0x1 (1)
                    associativity         = 0x9 (9)
                    size (in 512KB units) = 0x180 (384)
              
              384 * 512 KiB = 192 MiB

              (you can download cpuid from http://www.etallen.com/cpuid.html)

              • lend000 11 days ago

                Thanks for the info -- I must have misinterpreted the spot pricing history chart for c6g. While you're here, does the AWS hypervisor have any means to dedicate a portion of the L3 cache to each virtualized core, or is it a free-for-all for all of the cache space (such that a noisy neighbor could potentially be evicting data held in your L2 cache or even L1 cache by thrashing the L3 cache)?

                • _msw_ 9 days ago

                  For instance families like C, M, and R, processor cores are dedicated to one instance, and the virtual processor is pinned 1:1 to the underlying logical processor. Therefore there is no neighbor that is able to use the L1 and L2 caches.

                  For L3 cache, we try to optimize for the best overall performance for the majority of the time. Smaller instance sizes share L3 cache with other instances. I wouldn't call it a "free for all" given some changes in how the cache hierarchy has been shifting over time (e.g., Skylake-SP L2 cache per core was increased, and the L3 cache is now 'non-inclusive')

            • user5994461 11 days ago

              I want my video games, email reader, word, youtube, IDE and general python code to run faster. None of those are parallelizing much of anything.

              • jchw 11 days ago

                1. It is unlikely the CPU is a serious bottleneck in many of those circumstances. Even if it takes a measurable amount of time, that does not mean a faster CPU will make a meaningful improvement, if even measurable improvement. If you think it will, try overclocking and measuring your gmail load times.

                2. Like I said, in my experience Ryzen also competes just fine in single core. It just also decimates in multicore. I’d rather have some tasks run significantly faster than have some run very slightly faster. But that is disregarding the fact that not all tasks are the same and it does in fact win some categories. These CPU architectures are more divergent than usual for lately.

                3. Things you think aren’t parallel are. Video games using modern graphics APIs are in fact able to exploit multicore CPUs. Browsers absolutely exploit multicore CPUs. Your system in general will exploit multicore CPUs so during general usage when you are doing more things and have more software running, single core performance will be hurt less. And so on.

                • dageshi 11 days ago

                  Your email reader, word, youtube and IDE isn't likely to push the limits on any modern CPU, your video game is increasingly optimized around multiple cores because modern consoles ship with multi core cpu's and they need all the performance they can out of them. Only thing that might benefit from single cpu performance is probably your general python code.

                  • user5994461 11 days ago

                    Gmail and the IDE take ten seconds to load, while youtube is destroying any CPU to watch a 4k video (or 1080p on a battery saving laptop).

                    Youtube is possibly the single largest root cause for users upgrading laptops over the past 10 years. They made a silent transition to 60 FPS videos last year which cut hundreds of millions of users from watching HD.

                    • StillBored 11 days ago

                      Destroying CPU in some configurations....

                      https://www.youtube.com/watch?v=ef1wAfrMg5I is ~10% of 1 cpu on my desktop using chrome.

                      OTOH, I know what your talking about, my linux machine hates youtube, but that's because even with the chromium freeworld fork with some codec acceleration its still burning CPU like crazy.

                      So, a big part of this isn't a hardware problem so much as a software one combined with the constant fights over who's codec is the one true choice. AKA its a youtube and !windows/android+chrome problem.

                      • rumanator 11 days ago

                        > Gmail and the IDE take ten seconds to load,

                        Those tasks are IO-bound, not CPU-bound.

                        Your concerns have no basis whatsoever.

                        > Youtube is possibly the single largest root cause for users upgrading laptops over the past 10 years.

                        No one in the whole world feels the need to upgrade to a high-end workstation because of YouTube videos.

                  • gridlockd 11 days ago

                    > The problem is a lot of tasks that people want their CPU to be fast at is exactly stuff that parallelizes almost embarrassingly well. Compiling code, video rendering, compressing files.

                    Compiling code isn't embarrassingly parallel unless you're building some project with lots of files from scratch. Video rendering and compression also don't benefit as well as you may think:

                    https://www.phoronix.com/scan.php?page=article&item=3900x-39...

                    Meanwhile, single-threaded performance affects pretty much 100% of what you do.

                    In the end, I don't think there's a big difference either way.

                    • blattimwind 11 days ago

                      > Meanwhile, pointing at memory latency as the flaw in Ryzen has been a popular misdirection for a while now.

                      How is it a misdirection? The data is accurate and memory latency scaling is a well-known issue for simulations like e.g. games (which is a huge market for high end desktop CPUs and also the market 90 % of reviews address), where you can't really explain the performance differences just by higher clocks. It's considered the main reason why much older Intel CPUs can still outperform Ryzen CPUs in games.

                      On the other hand, if you take something like Cinebench you can literally turn XMP off (thus using JEDEC timings and bus speed) and still get almost the same score (within, say, 2 %). That's because Cinebench is benchmarking pretty much only ALU throughput. That's obviously an important factor for performance, but just as obviously not the only one.

                    • dralley 11 days ago

                      >Intel is still faster in most applications, simply by virtue of having a clock speed advantage that by far exceeds any IPC difference

                      This is already only marginally true, the difference is only about 5% depending on the application, and in some applications AMD comes out ahead anyway. Expect the remaining difference to disappear when Zen 3 releases in a few months.

                      >Another significant weakness of AMD's current platform is idle power consumption.

                      AMD seems to have caught up here almost entirely. They've done a lot of work to improve idle power consumption lately and the node advantage probably helps, too.

                      • highfrequency 11 days ago

                        Hi, this is very informative. To clarify a couple points: - By memory latency, do you mean the time to access an uncached portion of RAM? - RE clock speed advantage, are you referring to the fact that AMD turbo boost doesn't hit 5GHz?

                        • blattimwind 11 days ago

                          IIRC L3 is slightly slower on Zen 2, main memory as mentioned much slower.

                          Clock speed advantage -- Most Zen 2 CPUs don't overclock to 4.5 GHz on any core, let alone all-core. The boost numbers are reached with current firmware, but only for tiniest fractions of a second and never under any real load. Sustained single-core boost frequencies are 200-400 MHz lower than the specified boost frequency. On the other hand, Intel CPUs consistently reach their boost frequencies under load, and most CPUs can do their single-core boost as an all-core overclock under load (with much greater power consumption of course).

                          In practice this means that for equivalently priced parts (e.g. 3900X vs 10900K) the AMD part will have about a GHz lower clock for lightly threaded workloads, which are most workloads. With Intel settings, the Intel and AMD parts have about the same sustained clocks (3.8-4 GHz) under all-core load, but with the defaults of many motherboards the Intel part will run at 4.8-5 GHz, depending on the cooling.

                          • gruez 11 days ago

                            >In practice this means that for equivalently priced parts (e.g. 3900X vs 10900K) the AMD part will have about a GHz lower clock for lightly threaded workloads

                            They're only "equivalently priced" when you're talking about MSRP. Right now the 3900X sells for $413 and is in stock, whereas the i9-10900k sells for $530 and is out of stock.

                    • klelatti 11 days ago

                      AMD had to take radical action to survive and it was never in Intel's interests for it to vanish.

                      Intel had the twin defensive 'moats' of x86 and the leading process technology.

                      But now Intel has stalled relative to TSMC on the process lead (and possibly lost it) and the last few days have shown that the x86 moat is crumbling. The world will not move to TSMC manufactured ARM overnight but a significant shift could happen quite quickly I think. Intel will / has defended with lower prices but that will ultimately mean a big shift in business model.

                      but ....

                      Intel is the last firm with leading edge technology manufacturing in the US. If it starts to falter I can see a concerted effort to maintain that from the US government.

                      • api 11 days ago

                        We still have somewhat of a lead in areas like composites, exotic materials, jet engines, turbines, and aerospace, but they are eroding.

                        We can make good cars, though the majority of the good cars made in the US are made under the management of Japanese car companies. Tesla has some great technology and product design but on the manufacturing front they are behind the majors.

                        • klelatti 11 days ago

                          My mistake - I meant just in chip manufacturing. Definitely leading in lots and lots of other areas.

                          • deelowe 11 days ago

                            Yeah. Silicon manufacturing has huge implications for national defense. Not sure how relevant Intel is in that particular discussion. Did the US Govt express concerns when IBM stopped making chips in-house?

                            • kevin_thibedeau 11 days ago

                              They're relevant to the need for computers free from (foreign) backdoors. Otherwise, defense doesn't rely on cutting edge ICs any more. Modern processes can't deliver the reliability most military systems need, particularly if radiation hardening is a requirement.

                          • rowanG077 11 days ago

                            What? US cars are widely seen as relatively shitty compared to Asian and EU brands.

                            • api 11 days ago

                              Honda, Nissan, and Toyota manufacture a lot of cars in the US, though primarily for the US market. My Nissan Leaf is made in Tennessee.

                              • dboreham 11 days ago

                                You must be thinking of Chrysler.

                                • rowanG077 11 days ago

                                  I'm thinking of all of them except Ford and Tesla.

                                  • tomatotomato37 11 days ago

                                    So the winner of being the worst ranked in JD Powers and "Found On Road Dead?" All American companies have had a generally poor record in build quality; doesn't matter if they were started in Detroit or Silicon Valley.

                                    Sucks too, because otherwise they still are the best bang for your buck when it comes to performance.

                                    • vxNsr 11 days ago

                                      I have a friend who bought a new Ford Focus: it’s been in the shop every two months since she got it. No one else I know has that problem, everyone else I know buys foreign.

                            • smolder 11 days ago

                              Intel desktop cpus do not use less power. The only metric where intel desktop cpus win right now is maximum per-core performance, and not by much. They're worse at performance per dollar, performance per watt, maximum multithreaded performance, and overall power efficiency. (edit: perhaps not idle power efficiency...)

                              • blattimwind 11 days ago

                                As a desktop user, my CPU tends to be mostly idle. So overall power efficiency is impacted a lot by idle power consumption; my AMD Ryzen CPU alone draws significantly more power in idle than my previous several-years-old Intel system. In fact, just the IO die alone draws almost as much power as some office PCs.

                                • zrm 11 days ago

                                  Comparing idle power consumption for desktop parts is kind of a desperation. For laptops it matters, but the Zen2 laptops aren't using an IO die manufactured on the older process. For desktops the difference at idle is something like 10 watts, i.e. ~$10/year in electricity, and even that much only if the machine is both turned on and idle 24 hours a day the whole year, whereas anybody worried about power consumption would put it to sleep.

                                  Between Windows Update and crappy javascript, the theory that modern desktops are usually idle is also increasingly untrue, and under load the power consumption for the Intel parts is worse.

                                  • smolder 11 days ago

                                    Ah sorry, I thought I had seen zen2 measuring lower for idle consumption than comparable intel as well, but some searching says to me you're right.

                                  • bcrosby95 11 days ago

                                    Yes, AMD as a power hog or a space heater is just a weird assumption people make based upon some old chips. It hasn't really been true for any of the Zen architecture chips.

                                  • pedrocr 11 days ago

                                    All I've read says Intel has a manufacturing disadvantage to TSMC (where AMD, Apple and many others get their chips) and their CPUs are less power efficient, not more, compared to AMD and particularly these kinds of upcoming ARM parts. Is that not the case?

                                    • ksec 11 days ago

                                      > Is that not the case?

                                      If simply having a better, and cheaper product will equal to immediate success then Mid-High End Fashion, as well as gazillion of other products in many other industry would not have existed as there are always competitors that offers something better at cheaper price.

                                      Marketing / Discovery and Channels / Distributions. And that also excluding the software advantage Intel has.

                                      AWS only just had their GA on Zen 2, nearly a year since they first made the announcement. Compared to Intel. I dont have any insider information. But I guess AMD has a lot to learn with regards to dealing with HyperScalers.

                                      And you may have notice, Intel has way more leaks than usual in the past 12-18 months. That is part of the PR play to keep people from buying AMD while they try to Catch up.

                                      Intel as of today is still operating at 100% capacity with back-log orders to fulfil, and a new record revenue in the last quarter. So yes, technically Intel is inferior, but until all of those disadvantage materialise into financial numbers it is far too early to call the death of Intel.

                                      I dont hold any Intel Stocks but speaking as an AMD shareholder.

                                      • pedrocr 11 days ago

                                        > If simply having a better, and cheaper product will equal to immediate success then...

                                        I never said it did. I was just questioning if the facts were indeed those.

                                        • ksec 11 days ago

                                          Sorry, I was reading in the context of your parents.

                                          To your original question, the simple answer is yes.

                                  • ianai 11 days ago

                                    From what I’ve heard Intel management was taken over by marketing “professionals.” It’s an awful place to work and probably devoid of tech leadership.

                                    Aka yes it’s a cautionary tail and time to run from that ship.

                                  • jakeinspace 11 days ago

                                    I suspect that for national security reasons, the US federal government and DoD would not allow Intel to fail. Still, the military and government aren't big enough buyers of microprocessors to keep Intel competitive at its recent position, and I suppose that TSMC's planned fab in the US could be seen as an alternative.

                                    • jillesvangurp 11 days ago

                                      I think the writing was on the wall for a few years. IMHO Arm on servers is something that's been an option for quite long and the only thing that's surprising is how long x86 has managed to stay popular/relevant there.

                                      Also, oss instruction sets like Risc-V are going to be interesting.

                                      What went wrong at Intel is that they forgot to take the appropriate steps 10 years ago to avoid running out of options right now.

                                      Ten years ago it was already obvious that mobile CPUs were a thing and Intel's attempts to penetrate that market failed around that time. From that moment they were living on borrowed time.

                                      • klodolph 11 days ago

                                        It’s not quite that obvious. Server space is still huge. Growth in mobile usually corresponds to people using more servers. ARM servers are an option but I’d hesitate to lay bets on it—they have to beat Intel on TCO and it’s just not there yet. AMD is great but doesn’t have volume like Intel does, and while the I/O is better on AMD, dealing with NUMA on AMD is a bit more of a beast.

                                        I’m not saying that Intel’s not in trouble, just that the conclusions here are far from obvious. I have some skepticism for people who say that they saw this coming. AMD laid off a lot of top engineers before its recent resurgence. Intel’s failure to ship its 7nm node in volume was a surprise to a lot of people.

                                        Everyone knew that the new process nodes were more difficult, but outside a few experts, hardly anyone was in a position to predict when the move to smaller nodes would slow down.

                                        Not that long ago, people were praising Intel for their superior SSD controllers, or talking about how they would be making 5G modems.

                                        • cma 11 days ago

                                          AMD has 64 core processors without any NUMA issues now. They use a dedicated IO die to provide uniform memory access to all chiplets.

                                          ARM has a much weaker memory model with significant performance implications for multithreading as well.

                                          • jeffbee 11 days ago

                                            UMA on Zen2 is fake. NUMA-aware software is still significantly fast on Zen2/Rome if configured as NPS4, 4 NUMA nodes per socket.

                                            • fomine3 11 days ago

                                              That's true but even NPS1, it works well compared to Xeons for some workloads.

                                          • deelowe 11 days ago

                                            Generally, Intel is still best for TCO. AMD is better on specific high core count workloads. Arm isn't really competitive in the HPC/Cloud space, or at least hasn't been historically. Maybe that's starting to change?

                                        • api 11 days ago

                                          Ice Lake 10nm / 10th gen has a super weird crashing bug too:

                                          https://youtrack.jetbrains.com/issue/JBR-2310

                                          https://bugs.openjdk.java.net/browse/JDK-8248315

                                          No, no, no it is not an OS bug, a hypervisor bug, or a JVM bug. Read the whole thing if you have an hour to kill. It's confirmed to be a CPU bug, and Intel knows about it.

                                          I am really looking forward to the postmortem. The behavior reminds me of the old F00F bug.

                                          https://en.wikipedia.org/wiki/Pentium_F00F_bug

                                          • Google234 11 days ago

                                            Most of those reports seem to be on 14nm procs. Post Morten will probably be a microcode update.

                                            • api 11 days ago

                                              It's happening on Ice Lake cores. Process node probably doesn't matter.

                                          • justapassenger 11 days ago

                                            > Apple just announced that they’re dropping them in basically the next 5 years

                                            That's a big PR hit, but in terms of sales - Apple isn't really significant customer of x86. But, if it'd trigger Microsoft to double down on Windows on ARM, that could then become a threat. But MS is playing with ARMs for years, and nothing significant came out of it yet.

                                            • numpad0 11 days ago

                                              They still have 7nm and 14nm manufacturing businesses...

                                              • yjftsjthsd-h 11 days ago

                                                > They still have 7nm and 14nm manufacturing businesses...

                                                I'm pretty sure they have their 14nm business, and are working really hard to get a 7nm manufacturing business? A quick search gives me news articles about Intel hoping to have 7nm working by 2021.

                                                • PaulHoule 11 days ago

                                                  They have milked 14 nm for what it is worth in the long term in the interest of the short term.

                                                  They bent over backwards for cloud providers and offered them special deals that helped finance the cloud providers transitioning to own silicon. They fused off features to create false product "differentiation" like the IBM of old and failed to deliver technology after technology in working form (SGX, TSX, 10nm, ...) They held the performance of the PC platform back by trying to capture all of the BoM for a PC. (e.g. tried to kill off NVIDIA and ATI with integrated 'graphics')

                                                  Customers are angry now, that's their problem. Intel is like that Fatboy Slim album, "We're #1, why try harder?" They still think they are the #1 chipmaker in the world but now it is more like #2 or #3.

                                                  • sjwright 11 days ago

                                                    ...And who wants to take bets for how many years before Intel starts being a contract manufacturer of chips for Apple and others? Shall we open the bidding at five years?

                                                  • Aaronstotle 11 days ago

                                                    I think Intel has learned a painful lesson on resting on their Laurels. That being said, Jim Keller was there for 2 years (he resigned June 12th, so I'd bet they have some big things on the horizon, namely GPUs.

                                                    • regularfry 11 days ago

                                                      They don't have a particularly good track record here. Have we got reason to think that there's going to be better news for them this time round?

                                                    • google234123 11 days ago

                                                      I find it sad that the top comment on a HN thread starts with "Sooo... is ${1} like crying in a corner right now". Yes, all 100,000 employees are collectively crying a corner.

                                                    • DCKing 11 days ago

                                                      It's worth noting that this is based on ARM's Neoverse N1 IP, which is also used in the AWS Graviton2. The Graviton2 benchmarks damn close to the best AMD and Intel stuff, so this chip looks very promising [1]. It's really looking to be a breakthrough year for ARM outside of the mobile market.

                                                      [1]: https://www.anandtech.com/show/15578/cloud-clash-amazon-grav...

                                                      • Refefer 11 days ago

                                                        Phoronix paints a very different picture, especially in non-synthetic workloads[1]. Gravitron2 looks like a nice speedup over the first generation but either the optimization isn't there yet or there are areas which need additional work to become more developer/HPC competitive. That said, I'm thrilled we have competition in the architecture space for general purpose compute again.

                                                        [1] https://www.phoronix.com/scan.php?page=article&item=epyc-vs-...

                                                        • _msw_ 11 days ago

                                                          Disclosure: I work for AWS on cloud infrastructure

                                                          My personal opinion is that the Phoronix way places quantity over quality. Measuring performance is an important part of shining a light on where we can improve the product, but I get little practical information from those numbers, even when they are reported as non-synthetic. There are HPC workloads that are showing significant cost advantages when run on C6g, like computational fluid dynamics simulations. See [1].

                                                          I expect the scalability of HPC clustering to improve on C6g in the future, like C5n improved cluster scalability compared to C5 with the introduction of the Elastic Fabric Adapter. The Phoronix and Openbenchmarking.org approach doesn't give much insight into workloads like this.

                                                          My advice for an audience like folks on HN is is to test it for yourself. For me, being able to run my own experiments is how I come to understand infrastructure better. And the cloud lowers the barrier of running those experiments significantly by being available on-demand, just an API call away. I'd love to hear what you think, either in a thread here or you can contact me via addresses in my user profile.

                                                          [1] https://aws.amazon.com/blogs/compute/c6g-openfoam-better-pri...

                                                          • DCKing 11 days ago

                                                            Interesting data. Curious whether there's a logical explanation for these discrepancies in their setups.

                                                            • karkisuni 11 days ago

                                                              Didn't go too deep into it, but the AMD cpus being compared are different. Anandtech has an AWS-only EPYC 7571 (2 socket, 32 cores each, 2.5ghz), Phoronix has EPYC 7742 (1 socket, 64 cores, 2.2ghz). On top of that, Anandtech is using another AWS ec2 instance and Phoronix is testing on a local machine on bare metal.

                                                              Still would be interesting to know what differences caused the gap in results, but their setups were pretty different.

                                                              • jeffbee 11 days ago

                                                                That doesn't seem like it could explain a 20x difference in PostgreSQL performance.

                                                          • embrassingstuff 11 days ago

                                                            How different are these ARM server implentations from each other ?

                                                            Will we need to recompile? Will it be almost-100%-binary-equivalent-with-some-hidden-bugs ?

                                                            • yjftsjthsd-h 11 days ago

                                                              Ugh, yes; one of the perks of an Intel monoculture was that at least you only had one target to worry about, and inter-generational quirks were mostly limited to minor things. Now we have to deal with "this was optimized for (Intel|AMD) and doesn't work on (AMD|Intel)" and "the devs tested this on their x86 laptops and then it got weird when we went to run it on ARM" and "ARM is less of a platform and more of a collection of kinda-similar-looking systems that are mostly compatible". Don't get me wrong, I'll take this over a monoculture, especially an Intel monoculture, but there are some bumps on the road to a more diverse ecosystem.

                                                        • jeffbee 11 days ago

                                                          Does anyone have an evaluation board for these things? Their marketing materials scream "scam" to me. For one thing they compare to competing x86 parts by arbitrarily downrating them to 85% of their actual SPECrate scores. Why? Then they switch baseline x86 chips when making claims about power efficiency ... for performance claims they use the AMD EPYC 7742 then for performance/TDP they use the 7702, which has the tendency to make the AMD look worse because it is spending the same amount of power driving its uncore but it's 11% slower than the 7742.

                                                          Also, without pricing, all these efficiency claims are totally meaningless.

                                                        • jzwinck 11 days ago

                                                          This reminds me of Tilera, who had a 64 core mesh connected CPU ten about ten years ago. The problems seemed to be it was harder to optimize due to the mesh connectivity (like NUMA but multidimensional), low clock speeds, and lack of improvement after an initially promising launch.

                                                          Will this be the same? It seems possible. Does it really get more work done per watt than x86?

                                                          And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?

                                                          • jillesvangurp 11 days ago

                                                            It depends a bit on how you utilize these CPUs. A lot of server software is optimized for just a few cores. Even products optimized for using more than 1 thread tend to be tested and used mostly with 4/8 core configurations. And then of course there are a few popular server-side languages that are effectively single threaded typically (e.g. python) and use multiple processes to leverage multiple cores. Launching 80 python processes on an 80 core machine may not be the best way to utilize available resources compared to e.g. a Java process with a few hundred threads.

                                                            With non blocking IO and async processing that can be good enough but to fully utilize dozens/hundreds of CPU cores from a single process, you basically want something that can do both threading and async. But assuming each core performs at a reasonable percentage of e.g. a Xeon core (lets say 40%) and doesn't slow down when all cores are fully loaded, you would expect a CPU with 80 cores to more than keep up with a 16 or even 32 core Xeon. Of course the picture gets murkier if you throw in specialized instructions for vector processing, GPUs, etc.

                                                            • ddorian43 11 days ago

                                                              Yes most software is limited cores (example: encoding videos).

                                                              The best (efficient) way to utilize that many cores IS to have 1 pinned process/thread per-core: https://www.scylladb.com/ https://github.com/scylladb/seastar/

                                                              That would be the same in Python too. A problem is that you can't share the kernel pages for the code, and you need to have a shared-cache. Probably 0-mem-copy with no deserialization example: lmdb + Flat Buffers.

                                                              Nicer is to have 1/2x cores, but each core being 2x faster ;)

                                                            • rbanffy 11 days ago

                                                              You need a lot of memory bandwidth and large caches, or else the cores will starve. That's also why IBM mainframes have up to 4.5 GB of L4 cache.

                                                              • O_H_E 11 days ago

                                                                Ok, just wow. L4 cache more than my laptop's ram. Thanks for that awesome titbit.

                                                                PS: don't worry, my upgrade is on it's way :p

                                                                • yjftsjthsd-h 11 days ago

                                                                  :D A bit like the moment when I realized that on-CPU cache could now hold a complete DOS system, with programs included...

                                                                • zozbot234 11 days ago

                                                                  That's true of all high-frequency/high-core count hardware. Which is why running Java or Python codes on this hardware makes very little sense. Rust is more like it. Golang in a pinch.

                                                                  • imtringued 11 days ago

                                                                    It's the opposite. Running lots of poorly optimized processes allows you to amortize memory latency. If your software suffers from cache misses then it's not going to run out of memory bandwidth any time soon. Adding more threads will increase memory bandwidth utilization. Meanwhile hyper optimized AVX512 code is going to max out memory bandwidth with a dozen cores or less.

                                                                    • rbanffy 11 days ago

                                                                      > it's not going to run out of memory bandwidth any time soon

                                                                      No, but the higher the memory bandwidth, the sooner those processes can get back to their inefficiency.

                                                                      • CyberDildonics 11 days ago

                                                                        That's really not true. Memory bandwidth, just like memory capacity becomes a bottleneck when it is exceeded, but more doesn't automatically speed anything up. Java and python programs will likely be hopping around in memory and waiting on memory to make it to the CPU as a result.

                                                                        Typically only multiple cores running optimized software that will run through memory making heavy use of the prefetcher will exceed memory bandwidth.

                                                                    • blackoil 11 days ago

                                                                      Noob ques. Is there any fundamental limitation in Java or more like JVM will need to evolve to optimally use such architecture ??

                                                                      • twic 11 days ago

                                                                        AIUI the relevant weakness of Java here is that it typically has worse memory density and locality than something like Rust.

                                                                        Consider code which linearly goes through a list of points in 2D space and does some calculation on the coordinates.

                                                                        In Rust, the list is a Vec<(f64, f64)>. The Vec is a small object containing a pointer to a large block of data which contains all the points packed tightly together. Once the program has dereferenced the pointer and loaded the first point, all the others are immediately after it in memory, in order, containing nothing but the coordinates, and so the processor's cacheing and prefetching will make them available very quickly.

                                                                        In Java, the list is an ArrayList<Point2D.Double>. The ArrayList is a small object containing a pointer to an array of pointers to more small objects, one for each point. Each of the small objects has a two-word object header on it. The pointer plus header means that for every two words of coordinate, there are three words of overhead, so the cache is used much less effectively. The small objects aren't necessarily anywhere near one another in memory, or in order, so prefetching won't help.

                                                                        There are a couple of ways the Java situation can be improved.

                                                                        Firstly, today, you can replace the naive ArrayList<Point2D.Double> with a more compact structure which keeps all the coordinates in a single big array. This gives you the same efficiency as Rust, but requires programming effort (unless you can find an existing library which does it!), and may give you an API that is less efficient (if it copies coordinates to objects on retrieval) or convenient (if it gives you some cursor/flyweight API).

                                                                        Secondly, in the future, the JVM could get smarter. In principle, it could do the above rewriting as an optimisation, although i wouldn't want to rely on that. A good garbage collector could bring the small objects together in memory, to improve locality a bit.

                                                                        Thirdly, in the near-ish future, Java will get value types [0] which behave a lot more like Rust's types. That would give you equally good density and locality without having to jump through hoops.

                                                                        [0] http://openjdk.java.net/projects/valhalla/

                                                                      • rbanffy 11 days ago

                                                                        You will have to tune your code to need as little shared state across threads as you can. It's not fun, but tuning code at this level rarely is.

                                                                        • CyberDildonics 11 days ago

                                                                          The synchronization is what actually matters, shared memory being read is not a problem.

                                                                    • wtallis 11 days ago

                                                                      > And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?

                                                                      These chips obviously have variable clock speed, but apparently nothing like the complicated boost mechanisms on recent x86 processors. My guess is that Turbo speed here is simply full speed, and doesn't depend significantly on how many cores are active, and doesn't let the chip exceed its nominal TDP for short (or not so short) bursts the way x86 processors do.

                                                                      • rbanffy 11 days ago

                                                                        > and doesn't depend significantly on how many cores are active, and doesn't let the chip exceed its nominal TDP for short (or not so short) bursts the way x86 processors do

                                                                        Either that, or 3 GHz always exceeds the envelope and the chip is throttling clocks down all the time to keep itself inside the allowed power envelope.

                                                                        • zozbot234 11 days ago

                                                                          > doesn't let the chip exceed its nominal TDP for short (or not so short) bursts the way x86 processors do.

                                                                          That's more of an artifact of how TDP is defined than anything else. I doubt that this could peg even a single core at 3.0 GHz given a reasonable cooling setup, let alone run all cores @ 3.0 GHz.

                                                                          • greggyb 11 days ago

                                                                            Why do you think this won't keep a single core at 3.0GHz?

                                                                            You can get ~3.4GHz average sustained all-core speed on a 3990x (64-core, nominal 280W TDP). This is with an off-the-shelf AIO cooler.[0]

                                                                            Note: top-end air coolers are often competitive with AIOs, and can be had for $80-$100.[1]

                                                                            If you're buying a several thousand dollar CPU, dropping even $500 (much higher than you'd need for closed loop liquid or high-end air cooling) on cooling doesn't seem unreasonable.

                                                                            [0] https://www.anandtech.com/show/15483/amd-threadripper-3990x-...

                                                                            [1] https://www.youtube.com/watch?v=7VzXHUTqE7E

                                                                            • stan_rogers 11 days ago

                                                                              Neither the AIO nor the large tower (air) cooler are going to fly in a sever rack.

                                                                              • greggyb 11 days ago

                                                                                No. Servers will be utilizing jet-sounding fans, or air conditioned intake, or rack-level water cooling, or any number of other cooling solutions that can handle high the heat output.

                                                                                250W is not an absurd figure to shove in a server CPU. If you're buying an 80-core CPU, you're not going to be skimping on the cooling solution.

                                                                                Especially given the target market for Ampere is cloud providers, you can expect these to be racked in enclosures that provide sufficient cooling for their operational needs.

                                                                        • PaulHoule 11 days ago

                                                                          These chips are practical and can go into servers that are similar in performance to x86 servers.

                                                                          ARM has well-thought out NUMA support, probably a system this size or larger should be divided into logical partitions anyway. (e.g. out of 128 cores maybe you pick 4 to be management processors to begin with).

                                                                        • samcat116 11 days ago

                                                                          Products like this show that Apple could have an ARM based Mac Pro in two years relatively easily. They already have PCIe Gen 4. TDP and memory capacity is already more than intel provides in the Xeon workstation line that they use.

                                                                          • jagger27 11 days ago

                                                                            It would be weird (and cool) if Apple ends up being the company to provide easy off the shelf access to a powerful Arm workstation.

                                                                            • klelatti 11 days ago

                                                                              More of a case of "skating to where the puck is going".

                                                                              I know it's a bit of a cliche but it feels to me like Apple might have got its timing right on this one.

                                                                              • ed25519FUUU 11 days ago

                                                                                Timing is something Apple does really well. It's almost never the "first" to anything, but it waits until all of the stars align and then invest heavily.

                                                                                • nonesuchluck 11 days ago

                                                                                  It's not just heavy investment at their moment of need, it's also a habit of nurturing unlikely Plan B's for years and decades.

                                                                                  - When Apple founded ARM in 1990 with Acorn and VLSI, they didn't know that silly cacheless chip would become a world-beating juggernaut. But hey, as founders, they now have a license to mold the microarchitecture however they like.

                                                                                  - When Apple bought NeXT in 1997, they didn't know the sun was slowly setting on PowerPC. But they secretly nursed along the (already built) NeXTSTEP x86 port for years, until the time came to dust it off and start shipping product with Intel Inside.

                                                                                  - When Apple forked KHTML in 2001 and started building WebCore/WebKit, they didn't know that MS was about to leave Internet Explorer to wither on the vine, nor release the final Mac IE only 2 years later. But they quietly invested in building such a konquering (sorry!) product that (with Google/MS help) we're now at risk of an entirely different browser monoculture.

                                                                                  • fanf2 11 days ago

                                                                                    A small historical correction: the ARM3 (first ARM with a cache) predates the spin-off of ARM as a separate company. The Acorn Archimedes A540 (with ARM3) was released mid 1990 and ARM was founded later that year.

                                                                            • adrianmonk 11 days ago

                                                                              If they do that, I wonder whether it would make sense for Apple to get into the ARM server CPU business while they'are at it.

                                                                              Currently, the Intel Xeon is used in both high-end workstations and servers. If one x86 design can be suitable for both of those, presumably one ARM design could do the same.

                                                                              If they could sell server CPUs at a profit, then Apple could get more return on its design investment by getting into two markets. And they'd get more volume. Though apparently they'd be facing competition from Ampere and Amazon's Graviton.

                                                                              • why_only_15 11 days ago

                                                                                I've wondered for a long time if it would make sense for Apple to sell the A13 etc. to smart home device makers, on the theory that Apple can offer great HomeKit integration as well as a superior chip to anything else on the market (for e.g. video processing).

                                                                              • ed25519FUUU 11 days ago

                                                                                I think it’s a good time to invest in a Mac Pro. While working from home I’m asking myself the benefit of a laptop when a desktop could give me so much more performance.

                                                                                • coder543 11 days ago

                                                                                  The other person who replied to you probably paid half or a third of what you would pay for an equivalent Mac Pro.

                                                                                  The profit margins on the Mac Pro are just incredible. (Yes, I'm sure that equivalent professional workstation brands also have huge profit margins... no, that doesn't make me want to pay those lofty prices more.)

                                                                                  The only real value the Mac Pro provides is that it's the most powerful computer you're allowed to run macOS on legitimately. If you can do your work from Windows (with WSL) or Linux, you can save upwards of tens of thousands of dollars by building your own workstation, and that workstation can be significantly more powerful than any current Mac Pro at the same time.

                                                                                  For video professionals who rely on FCPX or similar macOS-only software, they don't really have a choice, and they get the opportunity to essentially pay $10k to $20k just for a license of macOS, which is fun.

                                                                                  • systemvoltage 11 days ago

                                                                                    I have a hackintosh (i7-8700k) and it feels about 2x faster than the top spec $4000 macbook pro latest 2019 model (subjective opinion ofcourse). It is such a huge difference, especially when using PyCharm and Adobe apps.

                                                                                    It is pricey but if it is something you want to buy for 5 years, it is about $100/month cost. Some people might want to buy it.

                                                                                    • jeff_vader 11 days ago

                                                                                      "The other person who replied" here. While it definitely costs a lot less - you also need to factor in the time you spend on selecting components, building and tweaking thermals. It's almost a small side hobby for a month or two.

                                                                                    • jeff_vader 11 days ago

                                                                                      I just did that when the whole lockdown happened - built a myself a AMD 3970x workstation. It's so good I do not want to go back to work laptop in the office.

                                                                                      • ed25519FUUU 11 days ago

                                                                                        How are the thermals? I'm concerned about it making my small office hot.

                                                                                        • jeff_vader 11 days ago

                                                                                          Well.. It does have [TDP][1] of 280 watts. So if you ran it at full capacity all the time - it'd be roughly equivalent to 300 watts heater. But at that time you'd probably be more worried about the fan noise (which really depends on how you build it). Most of the time my machine is at very light load. Case exhaust is slightly warmer than ambient, but that's not a good metric. Unfortunately I do not have a power meter.

                                                                                          I'd say heat is not a concern, but noise can be. It takes some time to figure out good fan curves to balance cpu temp vs noise. There may be some companies who do pre-built and well configured machines, but I haven't researched that at all.

                                                                                          [1]: https://en.wikipedia.org/wiki/Thermal_design_power

                                                                                          • hajimemash 11 days ago

                                                                                            Thermals have a subtle but huge effect on my room temp. My three devices of Macbook Pro, LG Ultrafine 5K and BenQ LED desklight together seem to add an extra 5-10 degrees to my small enclosed room temp. under load.

                                                                                            • ed25519FUUU 11 days ago

                                                                                              I notice that my macbook pro stays "warm" when plugged into my screen, even if it's closed. I think it might have something to do with powering the USB or displayport connected devices.

                                                                                              I now unplug my computer when I'm done, which is kind of annoying.

                                                                                    • emmanueloga_ 11 days ago

                                                                                      Is anybody else confused by the "Ampere" brand name? I was trying to figure out what Ampere is...

                                                                                      * There's one "Ampere Computing" [1], but I guess I'm not "in the know" since it is the first time I heard about it :-/

                                                                                      * There's one Ampere [2], "codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia".

                                                                                      Are both things related? Is "Nvidia's Ampere" developed by "Ampere" the company?

                                                                                      Also, I think Ampere is kind of a bad name for a processor line... just makes me think it of high current, power-hungry, low efficiency, etc. :-)

                                                                                      1: https://en.wikipedia.org/wiki/Ampere_Computing

                                                                                      2: https://en.wikipedia.org/wiki/Ampere_(microarchitecture)

                                                                                      • why_only_15 11 days ago

                                                                                        They are not related as far as I can tell other than being named "Ampere".

                                                                                      • shadykiller 11 days ago

                                                                                        Most logical naming of processors I’ve ever seen. E.g: Q80-33 - 80 Cores 3.3 Ghz Q32-17 - 32 Cores 1.7 Ghz

                                                                                        • sradman 11 days ago

                                                                                          > Where Graviton2 is designed to suit Amazon’s needs for Arm-based instances, Ampere’s goal is essentially to supply a better-than-Graviton2 solution to the rest of the big cloud service providers (CSPs).

                                                                                          So the question is whether they can land Google, Microsoft, and/or Alibaba as customers for an alternative to AWS M6g instances.

                                                                                          • klelatti 11 days ago

                                                                                            Oracle is an investor ($40m) and Techcrunch reports that they have been working with Microsoft so sounds like they are making progress on getting into the major cloud providers.

                                                                                          • cesaref 11 days ago

                                                                                            I'm interested to know what applications really scale to these core counts. When I was working with large datasets (for finance) other bottlenecks tended to dominate, not computation, so memory pressure, and throughput from the SAN were more important.

                                                                                            These high density configurations were key when rack space was at a premium, but these days, power is the limitation, so this is interesting to provide more low power cores, i'm just not sure who is going to get the most benefit from them though...

                                                                                            • regularfry 11 days ago

                                                                                              With 80 cores I can get 40 2-core VMs all pegging their CPUs on a single processor without any core contention. Multiply up by the number of sockets. That might be the more interesting application for cloud providers than going for a single use case for the entire box.

                                                                                              Where this might get interesting, depending on how the pricing stacks up, is that if you're in the cloud function business, this will increase the number of function instances you can afford to keep warmed up and ready to fire. In those situations you're not bottlenecked on the total bandwidth for the function itself (usually), your constraint is getting from zero to having the executable in VM it's going to run in, and from there getting it into the core past whatever it's contending with. If there's nothing to contend with and it's just waiting for a (probably fairly small) trigger signal, execution time from the point of view of whatever's downstream could easily be dominated by network transit times.

                                                                                              • tyingq 11 days ago

                                                                                                Plain old io-bound multiprocess work would be a good match. Like static content and php sites, for example. I imagine there's quite a lot of that out there.

                                                                                                • ed25519FUUU 11 days ago

                                                                                                  I'd wager to say the bulk of the web is CPU bound.

                                                                                                • ambicapter 11 days ago

                                                                                                  Insofar as webservers go, more cores equal more simultaneous connections, no? I doubt network links are saturated yet.

                                                                                                • rbanffy 11 days ago

                                                                                                  As cool as it is, these server announcements are somewhat disheartening.

                                                                                                  I want a workstation with one of these.

                                                                                                  • gpm 11 days ago

                                                                                                    It has PCIE lanes, what, other than price, stops you from buying a rack, sticking a graphics card in it, and calling it a workstation?

                                                                                                    • rbanffy 11 days ago

                                                                                                      Two reasons, mostly.

                                                                                                      Aesthetics is a big thing - rackmount servers are ugly and, unless there are panels covering it, they are horrendous deskside workstations.

                                                                                                      Another one is noise. These boxes are designed for environments where sounding like a vacuum cleaner is not an issue. Because of that, they sound like vacuum cleaners, with tiny whiny fans running at high speeds instead of more sedated larger fans and bigger heat exchangers.

                                                                                                      HP sold, for some time, the ZX6000 workstation that was mostly a rack server with a nice-looking mount. If someone decided to sell that mount, it'd solve reason #1, at least.

                                                                                                      • greggyb 11 days ago

                                                                                                        Shove it in a full tower case. You can mount most server hardware easily in such a case. At that point, you can cool with big slow fans.

                                                                                                      • sjwright 11 days ago

                                                                                                        Probably a lack of time resources for unbounded experimentation with unsupported configurations of expensive, non-mainstream hardware. Not all of us have the luxury to be a recreational sysadmin in our spare time.

                                                                                                        • gpm 11 days ago

                                                                                                          I struggle to imagine what you expect from running a desktop OS on an 80 core ARM cpu if it doesn't involve becoming a recreational sysadmin. That's definitely bleeding edge territory no matter the form factor the hardware ships in.

                                                                                                      • ed25519FUUU 11 days ago

                                                                                                        Racks are horribly loud! Do they even have a fan speed other than "insane"?

                                                                                                      • asguy 11 days ago

                                                                                                        Older specs, but the eMAG is available as a workstation: https://www.avantek.co.uk/ampere-emag-arm-workstation/

                                                                                                        • rbanffy 11 days ago

                                                                                                          Yes, but it's a previous generation. And it only does 32 threads.

                                                                                                          • asguy 11 days ago

                                                                                                            Have you tried calling them to ask if they’ll build you the newer one?

                                                                                                            • IanCutress 11 days ago

                                                                                                              We reviewed the Avantek eMag workstation. We're working with them and Ampere to get an Altra version when it's ready.

                                                                                                              • rbanffy 11 days ago

                                                                                                                I'd have to budget it first. I already took over my partner's desk space in our home office and it wouldn't be fair to allocate too much physical space to my gadgets. There is already a quite massive x86 Lenovo tower server under my side of the desk that gets a pass because it's where her Mac makes Time Capsule backups.

                                                                                                          • zanny 11 days ago

                                                                                                            You can get a Threadripper 3990x with 64 cores in a "regular" workstation.

                                                                                                            • rbanffy 11 days ago

                                                                                                              If I wanted an x86 workstation, I'd get the EPYC counterpart for the extra memory bandwidth.

                                                                                                            • nine_k 11 days ago

                                                                                                              BTW I wonder why one might need a workstation with many less beefy cores as opposed to several more powerful cores. What kind of interactive tasks require that?

                                                                                                              E.g. i suppose computer animation rather takes a GPU than 32-64 universal cores, and compilers are still not so massively-parallel.

                                                                                                            • spott 11 days ago

                                                                                                              I'm kind of curious: what is the selling point of an ARM server? Why would I use an ARM instance on AWS or similar instead of an x86?

                                                                                                              Are they significantly cheaper per GHz*core? If so, how hard is it to make use of that power, will a simple recompile work?

                                                                                                              • lowmemcpu 11 days ago

                                                                                                                Yes. Here's what AWS' page says

                                                                                                                > deliver significant cost savings over other general-purpose instances for scale-out applications such as web servers, containerized microservices, data/log processing, and other workloads that can run on smaller cores and fit within the available memory footprint.

                                                                                                                > provide up to 40% better price performance over comparable current generation x86-based instances1 for a wide variety of workloads,

                                                                                                                From what I read, it's not terribly hard to tell your compiler to compile for a particular instruction set, you just need to do it. Cost savings and better performance are great incentives, as well as Apple moving their Mac platform to it will drive more market share for developers to take the time to recompile.

                                                                                                                Edit: Forgot to add the source of those quotes: https://aws.amazon.com/ec2/graviton/

                                                                                                                • bluGill 11 days ago

                                                                                                                  It might or might not be hard to compile for a different cpu. Intel lets you play fast and loose with mutil threaded code without as many race conditions. As a result code that works fine on Intel often randomly gives wrong results on arm. Fixing this can be very hard.

                                                                                                                  Once it is fixed you are fine. Most of the big programs you might use are already fixed. Some languages give you gaurentees that make it just work.

                                                                                                                  • FnuGk 11 days ago

                                                                                                                    What is different on intel since you can play fast and loose with multi threading? Two threads reading and writing the same memory area without and locking would give problems regardless of the ISA or am i missing something?

                                                                                                                    • prattmic 11 days ago

                                                                                                                      ARM has a weakly-ordered memory model, while x86 is much more strongly-ordered. See https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory....

                                                                                                                      So e.g., on x86 if you store to A then store to B, then if another core sees the store to B it is guaranteed to see the store to A as well. This guarantee does not exist on ARM.

                                                                                                                      • petters 11 days ago

                                                                                                                        The C++ standard is famously complicated about atomics and memory order (for a good reason): https://en.cppreference.com/w/cpp/atomic/memory_order

                                                                                                                        But on x86, many of these things don't matter, if I understand correctly.

                                                                                                                        • jfkebwjsbx 11 days ago

                                                                                                                          Yes, you are missing something.

                                                                                                                          Two threads reading and writing to the same memory area do not necessarily give problems. In fact, many software is built to exploit several facts about how memory accesses work with respect each other.

                                                                                                                          ARM processors give very few guarantees, so code has to workaround that.

                                                                                                                      • jfkebwjsbx 11 days ago

                                                                                                                        Amazon marketing claims are not something you should trust.

                                                                                                                        • _msw_ 11 days ago

                                                                                                                          Disclosure: I work at AWS on build cloud infrastructure

                                                                                                                          It's good to be skeptical. I always encourage folks do experiments using their own trusted methodology. I believe that the methodology that engineering used to support this overall benefit claim (40% price/performance improvement) is sound. It is not the "benchmarketing" that I personally find troubling in industry.

                                                                                                                          • fomine3 11 days ago

                                                                                                                            We can measure power consumption, heat, performance, and buy on retail price for physical hardware.

                                                                                                                            But We can't measure power consumption/heat, possibly noisy neighbor exists while benchmarking, and can't know real price for cloud instance. I don't blame but it's difficult to comparing a hardware.

                                                                                                                            • jfkebwjsbx 11 days ago

                                                                                                                              I always frame it this way: if there were a 40% price/perf improvement, why is not everyone (including AWS!) using ARM clouds?

                                                                                                                              • _msw_ 9 days ago

                                                                                                                                We /are/ using them. :-)

                                                                                                                        • ksec 11 days ago

                                                                                                                          >what is the selling point of an ARM server? .....Are they significantly cheaper per GHzcore?

                                                                                                                          In the context of AWS.

                                                                                                                          They are cheaper per some / specific workload* on AWS.

                                                                                                                          Especially when ARM Graviton 2's vCPU on AWS are actual CPU core while Intel / AMD instances are CPU thread.

                                                                                                                          And in general AWS offers the G2 instances with the same vCPU core at 20% discount compared to AMD / Intel instances.

                                                                                                                          • lsofzz 11 days ago

                                                                                                                            > Especially when ARM Graviton 2's vCPU on AWS are actual CPU core while Intel / AMD instances are CPU thread.

                                                                                                                            Thank you for that information. Is there a reference that documents this somewhere?

                                                                                                                            • ksec 11 days ago

                                                                                                                              It is clearly listed on AWS Instance Types [1]

                                                                                                                              Each vCPU is a thread of either an Intel Xeon core or an AMD EPYC core, except for M6g instances, A1 instances, T2 instances, and m3.medium.

                                                                                                                              Each vCPU on M6g instances is a core of the AWS Graviton2 processor.

                                                                                                                              Each vCPU on A1 instances is a core of an AWS Graviton Processor.

                                                                                                                              [1] https://aws.amazon.com/ec2/instance-types/

                                                                                                                              • lsofzz 11 days ago

                                                                                                                                I looked around but didn't find it. Thanks a lot.

                                                                                                                          • bluGill 11 days ago

                                                                                                                            Less electricity used. Air conditioning is a big cost in large data centers. Lower power use cpus mean less heat which means less ac needed which drives down total costs.

                                                                                                                            Of course different cpus can do different amounts of work per amount of electricity used, but arm generally works out better on a watt per unit of work basis.

                                                                                                                            • bluedino 11 days ago

                                                                                                                              In the past, Google said they would switch to POWER if they could get a 10% energy savings by doing so.

                                                                                                                              • bluGill 10 days ago

                                                                                                                                Facebook has hinted (they won't give real numbers) that adding a new compiler optimization has lowered their electric bill by a fee hundred thousand dollars per year.

                                                                                                                          • nullifidian 11 days ago

                                                                                                                            How come there isn't a trademark issue with NVidia? I was very confused for a moment.

                                                                                                                            • dbancajas 11 days ago

                                                                                                                              "Ampere" can't be trade marked since it's a name of a scientist? Unless they are operating on the same market/segment and can prove there is willful intent to defraud customers? probably a hard sell.

                                                                                                                              • nullifidian 11 days ago

                                                                                                                                So is Tesla. And Ford is a name of an entrepreneur. Are these also not trademark protected?

                                                                                                                                >they are operating on the same market/segment

                                                                                                                                They are. Called computation.

                                                                                                                                >willful intent to defraud customers

                                                                                                                                Is it a requirement? I doubt it.

                                                                                                                                btw, I only clicked the link because I thought of the Nvidia's product, so they are definitely getting eyeball traffic due to the name.

                                                                                                                                UPD: I recognize that I'm unlearned in trademark law, so I'm not insisting on anything.

                                                                                                                                • klelatti 11 days ago

                                                                                                                                  Ampere was founded (and presumably name registered) in 2017, Nvidia's Ampere announced in 2020?

                                                                                                                                  Ampere had products on sale in 2019.

                                                                                                                                  If there is a case I can't see Nvidia winning it.

                                                                                                                                  • nullifidian 11 days ago

                                                                                                                                    Nvidia's roadmap for microarchitecture names goes way back. I can google up NVidia Ampere mentions in 2017.

                                                                                                                                    • klelatti 11 days ago

                                                                                                                                      I think it was rumours in 2017 with an actual announcement later but in any event I'm not sure using a name on a slide has the same weight as using for a real product being bought by customers.

                                                                                                                                      How long were Ampere planning to use the name before 2017 and then does Nvidia using it on a slide in a presentation force them to change it? Still think Nvidia would lose on this one.

                                                                                                                                      • sitkack 11 days ago

                                                                                                                                        How about companies stop co-opting the names of famous scientists? Have a little more creativity.

                                                                                                                                        • yjftsjthsd-h 11 days ago

                                                                                                                                          To be fair, naming things is a pain. It's the same problem we have naming software/services (i.e. the neverending "Show HN"/launch posts with comments "this name conflicts with the following multiple other things").

                                                                                                                              • the_hoser 11 days ago

                                                                                                                                The name of the company is Ampere. The name of the product is Altra. Trademarks don't automatically apply to all usages of the word.

                                                                                                                              • fizixer 11 days ago

                                                                                                                                Am I the only one who is super-annoyed at having to figure out everytime if this is Ampere the company or Ampere the new nVidia line?

                                                                                                                                I mean it's probably not the fault of either, and a huge coincidence we're getting a flurry of news articles about both in summer of 2020, but come'on (can we have some kind of edits in the titles of HN posts to make the distinction clear?).

                                                                                                                                • unexaminedlife 11 days ago

                                                                                                                                  The thing that has me bearish on cpu manufacturers in general... From what I understand parallel architectures vastly simplify the overall schematics of CPUs in general, while retaining the power-saving benefits.

                                                                                                                                  As we approach the critical velocity (supply / demand) for parallel architectures, the prospects of bootstrapping a CPU manufacturing company will become extremely feasible. IMO currently it's mostly the specialized knowledge needed to design CPUs that keeps this mostly out of reach today.

                                                                                                                                  I'm no expert, just have an interest in the space, so any dissenting opinions / facts welcome.

                                                                                                                                  • goerz 11 days ago

                                                                                                                                    Can anyone explain in a few sentences why the ARM architecture seems to outperform traditional CPUs so much? What fundamentally prevents Intel from building something comparable?

                                                                                                                                    • webaholic 11 days ago

                                                                                                                                      There is no inherent advantage to the ARM architecture other than it being designed recently (64-bit ARM is less than a decade old) whereas x86 has a lot of baggage it has to carry.

                                                                                                                                      There is no proof that these outperform traditional CPUs at all. That is the reason you don't see them being used anywhere other than niche use cases or for cost reasons.

                                                                                                                                      • lowbloodsugar 11 days ago

                                                                                                                                        >That is the reason you don't see them being used anywhere other than niche use cases or for cost reasons.

                                                                                                                                        By 2017 there were three times as many smartphones than PCs, all running ARM chips.

                                                                                                                                        The top four supercomputers all use RISC, and the fastest uses ARM.

                                                                                                                                        • jfkebwjsbx 11 days ago

                                                                                                                                          Phones are not built for performance, which is what was asked about.

                                                                                                                                          As for supercomputers, >90% of them are Intel/AMD.

                                                                                                                                          • lowbloodsugar 11 days ago

                                                                                                                                            Phones are built for performance per watt. Phones are benchmarked. In the context of a discussion on Apple introducing ARM chips into the Macbook line, performance per watt is far more meaningful. For most users, battery life is the issue once minimum performance criteria have been met.

                                                                                                                                            Will there be Razor laptops that last less than an hour on battery that can beat them? Sure.

                                                                                                                                            Will there be people who complain that the Mac isn't fast enough when plugged in? Already happening: the recent Macbook Pros have had complaints about thermal throttling, that obviously slightly larger Dell with a decent fan doesn't have.

                                                                                                                                            But Apple will build performance laptops, using ARM chips, and they will be faster than the equivalent Intel Macbooks if only because they aren't throttled.

                                                                                                                                            • jfkebwjsbx 11 days ago

                                                                                                                                              The context of the discussion is literally ARM scaling up to desktop performance.

                                                                                                                                              The person you replied to said:

                                                                                                                                              > There is no proof that these outperform traditional CPUs at all.

                                                                                                                                              To which you replied talking about embedded market share and supercomputer which have nothing to do with that.

                                                                                                                                              Since now you mention Apple and MacBooks, which haven't been even mentioned, I think you are answering to the wrong thread/post.

                                                                                                                                      • dahfizz 11 days ago

                                                                                                                                        It is a Reduced Instruction Set computer. It's a greatly simplified design.

                                                                                                                                        The x86_64 ISA is absolutely insane. The only way to implement it in hardware efficiently is to "compile" the super complicated instructions into micro-ops which can actually be decoded and executed on the CPU.

                                                                                                                                        Said another way, Intel has to implement a compiler in hardware which compiles the machine code before it gets executed. The extra complexity means more power and less performance.

                                                                                                                                        You can read more about how microcode and micro ops work here: https://en.m.wikipedia.org/wiki/Intel_Microcode

                                                                                                                                        • cesarb 11 days ago

                                                                                                                                          > The x86_64 ISA is absolutely insane. The only way to implement it in hardware efficiently is to "compile" the super complicated instructions into micro-ops which can actually be decoded and executed on the CPU.

                                                                                                                                          > Said another way, Intel has to implement a compiler in hardware which compiles the machine code before it gets executed. The extra complexity means more power and less performance.

                                                                                                                                          This is a sadly prevalent misconception.

                                                                                                                                          There is no "compiler in hardware". There are two kinds of instructions; the simpler ones (which are the most common) are expanded directly into a fixed sequence of micro-ops, while the more complicated ones act like a subroutine call to the microcode. The closest software analogue would be a macro assembler, not a compiler.

                                                                                                                                          AFAIK, the extra complexity for efficiently decoding x86 instructions comes mostly from their variable length without an explicit length indication, and from the variable number of prefixes which can change the interpretation of the following byte, which makes decoding an instruction a serial task. IIRC, both Intel and AMD have a couple of tricks to reduce the impact this has on both power and performance: caching the already decoded micro-ops, and storing the instruction boundaries in the L1 instruction cache.

                                                                                                                                          • jeffbee 11 days ago

                                                                                                                                            So, that's the freshman-year CS view of the topic, but back here in reality land the "complicated" x86 instruction format has pretty much destroyed all others and none of the supposed advantages of RISC actually exist. Remember that the whole point of RISC is that the CPUs would supposedly run faster. That hasn't happened. There are no RISC CPUs running faster than state-of-the-art x86 CPUs. POWER8 comes closest, but does not exceed.

                                                                                                                                            The whole RISC philosophy was a huge mistake. Yes, x86 instructions do not map well to transistors, and they have to be unpacked into uops to be executed. This is a form of compression. Having a compressed program image turns out to be a massive advantage. RISC proponents thought that x86 was so complicated they could beat Intel with their simple instruction decoders. That almost, but not really, made sense in 1990 but since then has made increasingly less sense, until today where the amount of sense this makes has hit zero. The x86 instruction decoder is a very small part of the floor plan of a modern CPU and every time they rev the microarchitecture it gets smaller. The number of transistors needed to decode the VEX prefix is like a speck of sand on the beach of a 512x512-bit multiplier.

                                                                                                                                            • acidbaseextract 11 days ago

                                                                                                                                              > The whole RISC philosophy was a huge mistake. Yes, x86 instructions do not map well to transistors, and they have to be unpacked into uops to be executed.

                                                                                                                                              The RISC philosophy wasn't a mistake. Our architectures have just become more sophisticated so that we don't have to make a binary choice. The hybrid is good. The internal uops get the pipelining advantages of RISC, while we get the encoding compression of a CISC instruction set.

                                                                                                                                            • umanwizard 11 days ago

                                                                                                                                              I don't think that's the entire reason. Most of the common x86 instructions occurring in a normal program can be decoded to a few uops in a straightforward way, and since Sandy Bridge the decoded uops are cached anyway.

                                                                                                                                              So this would only be a significant bottleneck for hot loops that are large enough that they don't fit in the uop cache.

                                                                                                                                              It's definitely a real issue but it seems wrong to pin all or even most of Intel's stagnation on that.

                                                                                                                                              • gandalfgeek 11 days ago

                                                                                                                                                If you look at the latest ARM instruction sets they are not really "RISC". Sure, they're much saner than than the crazy legacy instructions that x86 carries, but still nowhere near a "real RISC" ISA as espoused by Hennessy and Patterson, the hallmarks of which are simple, orthogonal, atomic instructions, and a small number of them. Currently that is most closely embodied by RISC-V.

                                                                                                                                                If you look at how you can get computing gains going forward after the end of Moore's law, of course the glib answer is "parallelize across more cores!" but the more interesting path is to notice that behemoth single cores like x86 spend a ton of silicon area trying to optimize straight-line execution with things like speculative execution. If you saved all that silicon area, making each single core slower but smaller, but packed more cores on the die as a result, you would most likely come out faster. [1]

                                                                                                                                                [1]: https://science.sciencemag.org/content/368/6495/eaam9744/tab...

                                                                                                                                            • klelatti 11 days ago

                                                                                                                                              Two questions:

                                                                                                                                              Does TSMC have the capacity to support AMD / AWS / Ampere etc making a significant dent in the server market alongside longstanding commitments to Apple etc?

                                                                                                                                              Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?

                                                                                                                                              • ksec 11 days ago

                                                                                                                                                >TSMC....

                                                                                                                                                TSMC never had capacity problem. Which mainstream media likes to run the story. You dont go and ask if TSMC has another spare 10K wafer capacity sitting around. TSMC plans their capacity based on their client's forecasting and projection many months in advance. They will happily expand their capacity if you are willing to commit to it. Like how Apple was willing to bet on TSMC, and TSMC basically built a Fab specifically for Apple.

                                                                                                                                                This is much easier for AWS since they are using it themselves with their own SaaS offering. It is harder for AMD since they dont know how much they could sell. And AMD being conservatives meant they dont order more than they are able to chew.

                                                                                                                                                >Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?

                                                                                                                                                I am not sure I understand the question correctly. But AWS already invested hundreds of millions in their own ARM CPU called Graviton.

                                                                                                                                              • rurban 10 days ago

                                                                                                                                                The most interesting blurp I read was "superscalar aggressive out-of-order execution". But I read nothing about security mitigations or concerns with such "aggressive" optimizations.

                                                                                                                                                • paulsutter 11 days ago

                                                                                                                                                  Maybe Intel should become a fab like TSMC and leave the CPU market to more innovative folks

                                                                                                                                                  • ksec 11 days ago

                                                                                                                                                    They did with Intel Custom Foundry. They tried and they failed. And they currently have no intention to try that again. At least not until they admit defeat. Which is going to take at least another few years if not longer.

                                                                                                                                                    • dralley 11 days ago

                                                                                                                                                      >They did with Intel Custom Foundry. They tried and they failed.

                                                                                                                                                      From what I've heard, they didn't try very hard. Apparently they thought all they had to do was make chips, and that the sheer "technical superiority" of their process meant that they could treat their customers as second-class stakeholders, withhold information about their production timelines, etc.

                                                                                                                                                    • ArgyleSound 11 days ago

                                                                                                                                                      Isn't the fab part precisely where Intel has hit a giant stumbling block