Sooo... is intel like crying in a corner right now? On one side we have AMD eating their lunch in the consumer space, they still haven’t launched a full gamut of 10nm CPUs. Apple just announced that they’re dropping them in basically the next 5 years. And now ARM really is encroaching on their core server business.
I feel like in 20 years from now we’re gonna be using intel as a cautionary tale of hubris and mismanagement. Or whatever it is that caused them to fail so spectacularly.
Honestly I think the proclaimed death of intel is vastly exaggerated. AMD came back from worse places and they do still have the manufacturing edge. Intel CPU for desktop still use less power, which is a big plus. How many people do you know that bought the fastest CPU available recently? Glad AMD is back on track, they were in a rough place, far worse than intels current situation.
For what it's worth, Intel is still faster in most applications, simply by virtue of having a clock speed advantage that by far exceeds any IPC difference, and also by having much lower memory latencies. AMD has basically a 20-30 ns extra latency over Intel; so with good memory you can do ~45 ns on current Intels, but that will give you ~65 ns on a Ryzen. That's significant for a lot of code (e.g. pointer chasing, complex logic etc.).
On the other hand, few applications scale efficiently to more than just four cores. Yes, of course, AMD delivers more Cinebenchpoints-per-Dollar and usually more Cinebenchpoints overall, but that's not necessarily an interesting metric.
Personally I find that if I'm waiting on something to complete that the application in question tends to use only a tiny number of cores for the task at hand. Usually one.
Another significant weakness of AMD's current platform is idle power consumption.
These factors leave me with a much more nuanced impression than "Intel is ded" or "HOW IS INTEL GOING TO CATCH UP TO THIS????"; CPU reviews these days are just pure clickbait.
The problem is a lot of tasks that people want their CPU to be fast at is exactly stuff that parallelizes almost embarrassingly well. Compiling code, video rendering, compressing files. People buying CPUs for this are not as concerned about how many cycles it takes to jump through a vtable as long as its not slow.
Meanwhile, pointing at memory latency as the flaw in Ryzen has been a popular misdirection for a while now. People warned me about it being a performance pitfall since before I bought my first Ryzen processor. In practice it doesn’t show up in even the most complexity intensive workloads as a serious issue. For example, Zen 2 performs very well on hardware emulation. This is possibly because where it takes a hit in memory latency it makes up in caching and prefetching, but honestly I don’t know and I am not sure how to measure. In any case it’s certainly favorably comparable to Intel’s best chipsets in single core workloads even if not on top. Factor in price and multicore workloads and you now have the exact reasons why people like me have been singing the praises... Intel’s single core lead may exist in some form but it is not what it once was, it is not an unconditional lead where an Intel core beats an AMD core. Not even close.
None of this means Intel’s dead of course, but IMO thats mostly because they have a lot more going on than just being the best CPU. They’ve got their dedicated GPU coming out, and plenty of ancillary technology as well. It does seem like for a company like Intel having to take a backseat in CPUs for a while will be painful; unlike AMD, this is a new position for Intel and maybe not one they will handle well.
You can get an idea of how popular different processors are in the server space by looking at the AWS EC2 spot market. Top end Xeon server processors (C5 and Z1d) typically have much lower spot discounts than AMD EPYC based processors (r5ad), although ARM c6g instances have been pushed up in price significantly over the last few months, perhaps as people switch over to them for the per-computational-unit cost savings.
Of course, this is all a factor of Amazon's supply of instances and their chosen on-demand pricing level, but the trends are certainly interesting, and show steady demand for fast Xeon's and increasing demand for ARM's. I have run some compute heavy workloads on the best AMD's I could find on AWS and the speed difference per core for my particular workload was nearly 50%, which got worse as it scaled up to bigger instances because my workload uses a lot of L3 cache. I hear about EPYC's with 256MB of L3 cache but I can't seem to find those on AWS -- only ones with 8MB of cache.
Thanks for the info -- I must have misinterpreted the spot pricing history chart for c6g. While you're here, does the AWS hypervisor have any means to dedicate a portion of the L3 cache to each virtualized core, or is it a free-for-all for all of the cache space (such that a noisy neighbor could potentially be evicting data held in your L2 cache or even L1 cache by thrashing the L3 cache)?
For instance families like C, M, and R, processor cores are dedicated to one instance, and the virtual processor is pinned 1:1 to the underlying logical processor. Therefore there is no neighbor that is able to use the L1 and L2 caches.
For L3 cache, we try to optimize for the best overall performance for the majority of the time. Smaller instance sizes share L3 cache with other instances. I wouldn't call it a "free for all" given some changes in how the cache hierarchy has been shifting over time (e.g., Skylake-SP L2 cache per core was increased, and the L3 cache is now 'non-inclusive')
1. It is unlikely the CPU is a serious bottleneck in many of those circumstances. Even if it takes a measurable amount of time, that does not mean a faster CPU will make a meaningful improvement, if even measurable improvement. If you think it will, try overclocking and measuring your gmail load times.
2. Like I said, in my experience Ryzen also competes just fine in single core. It just also decimates in multicore. I’d rather have some tasks run significantly faster than have some run very slightly faster. But that is disregarding the fact that not all tasks are the same and it does in fact win some categories. These CPU architectures are more divergent than usual for lately.
3. Things you think aren’t parallel are. Video games using modern graphics APIs are in fact able to exploit multicore CPUs. Browsers absolutely exploit multicore CPUs. Your system in general will exploit multicore CPUs so during general usage when you are doing more things and have more software running, single core performance will be hurt less. And so on.
Your email reader, word, youtube and IDE isn't likely to push the limits on any modern CPU, your video game is increasingly optimized around multiple cores because modern consoles ship with multi core cpu's and they need all the performance they can out of them. Only thing that might benefit from single cpu performance is probably your general python code.
Gmail and the IDE take ten seconds to load, while youtube is destroying any CPU to watch a 4k video (or 1080p on a battery saving laptop).
Youtube is possibly the single largest root cause for users upgrading laptops over the past 10 years. They made a silent transition to 60 FPS videos last year which cut hundreds of millions of users from watching HD.
OTOH, I know what your talking about, my linux machine hates youtube, but that's because even with the chromium freeworld fork with some codec acceleration its still burning CPU like crazy.
So, a big part of this isn't a hardware problem so much as a software one combined with the constant fights over who's codec is the one true choice. AKA its a youtube and !windows/android+chrome problem.
> Meanwhile, pointing at memory latency as the flaw in Ryzen has been a popular misdirection for a while now.
How is it a misdirection? The data is accurate and memory latency scaling is a well-known issue for simulations like e.g. games (which is a huge market for high end desktop CPUs and also the market 90 % of reviews address), where you can't really explain the performance differences just by higher clocks. It's considered the main reason why much older Intel CPUs can still outperform Ryzen CPUs in games.
On the other hand, if you take something like Cinebench you can literally turn XMP off (thus using JEDEC timings and bus speed) and still get almost the same score (within, say, 2 %). That's because Cinebench is benchmarking pretty much only ALU throughput. That's obviously an important factor for performance, but just as obviously not the only one.
>Intel is still faster in most applications, simply by virtue of having a clock speed advantage that by far exceeds any IPC difference
This is already only marginally true, the difference is only about 5% depending on the application, and in some applications AMD comes out ahead anyway. Expect the remaining difference to disappear when Zen 3 releases in a few months.
>Another significant weakness of AMD's current platform is idle power consumption.
AMD seems to have caught up here almost entirely. They've done a lot of work to improve idle power consumption lately and the node advantage probably helps, too.
Hi, this is very informative. To clarify a couple points:
- By memory latency, do you mean the time to access an uncached portion of RAM?
- RE clock speed advantage, are you referring to the fact that AMD turbo boost doesn't hit 5GHz?
IIRC L3 is slightly slower on Zen 2, main memory as mentioned much slower.
Clock speed advantage -- Most Zen 2 CPUs don't overclock to 4.5 GHz on any core, let alone all-core. The boost numbers are reached with current firmware, but only for tiniest fractions of a second and never under any real load. Sustained single-core boost frequencies are 200-400 MHz lower than the specified boost frequency. On the other hand, Intel CPUs consistently reach their boost frequencies under load, and most CPUs can do their single-core boost as an all-core overclock under load (with much greater power consumption of course).
In practice this means that for equivalently priced parts (e.g. 3900X vs 10900K) the AMD part will have about a GHz lower clock for lightly threaded workloads, which are most workloads. With Intel settings, the Intel and AMD parts have about the same sustained clocks (3.8-4 GHz) under all-core load, but with the defaults of many motherboards the Intel part will run at 4.8-5 GHz, depending on the cooling.
AMD had to take radical action to survive and it was never in Intel's interests for it to vanish.
Intel had the twin defensive 'moats' of x86 and the leading process technology.
But now Intel has stalled relative to TSMC on the process lead (and possibly lost it) and the last few days have shown that the x86 moat is crumbling. The world will not move to TSMC manufactured ARM overnight but a significant shift could happen quite quickly I think. Intel will / has defended with lower prices but that will ultimately mean a big shift in business model.
Intel is the last firm with leading edge technology manufacturing in the US. If it starts to falter I can see a concerted effort to maintain that from the US government.
We still have somewhat of a lead in areas like composites, exotic materials, jet engines, turbines, and aerospace, but they are eroding.
We can make good cars, though the majority of the good cars made in the US are made under the management of Japanese car companies. Tesla has some great technology and product design but on the manufacturing front they are behind the majors.
Yeah. Silicon manufacturing has huge implications for national defense. Not sure how relevant Intel is in that particular discussion. Did the US Govt express concerns when IBM stopped making chips in-house?
They're relevant to the need for computers free from (foreign) backdoors. Otherwise, defense doesn't rely on cutting edge ICs any more. Modern processes can't deliver the reliability most military systems need, particularly if radiation hardening is a requirement.
So the winner of being the worst ranked in JD Powers and "Found On Road Dead?" All American companies have had a generally poor record in build quality; doesn't matter if they were started in Detroit or Silicon Valley.
Sucks too, because otherwise they still are the best bang for your buck when it comes to performance.
Intel desktop cpus do not use less power. The only metric where intel desktop cpus win right now is maximum per-core performance, and not by much. They're worse at performance per dollar, performance per watt, maximum multithreaded performance, and overall power efficiency. (edit: perhaps not idle power efficiency...)
As a desktop user, my CPU tends to be mostly idle. So overall power efficiency is impacted a lot by idle power consumption; my AMD Ryzen CPU alone draws significantly more power in idle than my previous several-years-old Intel system. In fact, just the IO die alone draws almost as much power as some office PCs.
Comparing idle power consumption for desktop parts is kind of a desperation. For laptops it matters, but the Zen2 laptops aren't using an IO die manufactured on the older process. For desktops the difference at idle is something like 10 watts, i.e. ~$10/year in electricity, and even that much only if the machine is both turned on and idle 24 hours a day the whole year, whereas anybody worried about power consumption would put it to sleep.
All I've read says Intel has a manufacturing disadvantage to TSMC (where AMD, Apple and many others get their chips) and their CPUs are less power efficient, not more, compared to AMD and particularly these kinds of upcoming ARM parts. Is that not the case?
If simply having a better, and cheaper product will equal to immediate success then Mid-High End Fashion, as well as gazillion of other products in many other industry would not have existed as there are always competitors that offers something better at cheaper price.
Marketing / Discovery and Channels / Distributions. And that also excluding the software advantage Intel has.
AWS only just had their GA on Zen 2, nearly a year since they first made the announcement. Compared to Intel. I dont have any insider information. But I guess AMD has a lot to learn with regards to dealing with HyperScalers.
And you may have notice, Intel has way more leaks than usual in the past 12-18 months. That is part of the PR play to keep people from buying AMD while they try to Catch up.
Intel as of today is still operating at 100% capacity with back-log orders to fulfil, and a new record revenue in the last quarter. So yes, technically Intel is inferior, but until all of those disadvantage materialise into financial numbers it is far too early to call the death of Intel.
I dont hold any Intel Stocks but speaking as an AMD shareholder.
I suspect that for national security reasons, the US federal government and DoD would not allow Intel to fail. Still, the military and government aren't big enough buyers of microprocessors to keep Intel competitive at its recent position, and I suppose that TSMC's planned fab in the US could be seen as an alternative.
I think the writing was on the wall for a few years. IMHO Arm on servers is something that's been an option for quite long and the only thing that's surprising is how long x86 has managed to stay popular/relevant there.
Also, oss instruction sets like Risc-V are going to be interesting.
What went wrong at Intel is that they forgot to take the appropriate steps 10 years ago to avoid running out of options right now.
Ten years ago it was already obvious that mobile CPUs were a thing and Intel's attempts to penetrate that market failed around that time. From that moment they were living on borrowed time.
It’s not quite that obvious. Server space is still huge. Growth in mobile usually corresponds to people using more servers. ARM servers are an option but I’d hesitate to lay bets on it—they have to beat Intel on TCO and it’s just not there yet. AMD is great but doesn’t have volume like Intel does, and while the I/O is better on AMD, dealing with NUMA on AMD is a bit more of a beast.
I’m not saying that Intel’s not in trouble, just that the conclusions here are far from obvious. I have some skepticism for people who say that they saw this coming. AMD laid off a lot of top engineers before its recent resurgence. Intel’s failure to ship its 7nm node in volume was a surprise to a lot of people.
Everyone knew that the new process nodes were more difficult, but outside a few experts, hardly anyone was in a position to predict when the move to smaller nodes would slow down.
Not that long ago, people were praising Intel for their superior SSD controllers, or talking about how they would be making 5G modems.
Generally, Intel is still best for TCO. AMD is better on specific high core count workloads. Arm isn't really competitive in the HPC/Cloud space, or at least hasn't been historically. Maybe that's starting to change?
> Apple just announced that they’re dropping them in basically the next 5 years
That's a big PR hit, but in terms of sales - Apple isn't really significant customer of x86. But, if it'd trigger Microsoft to double down on Windows on ARM, that could then become a threat. But MS is playing with ARMs for years, and nothing significant came out of it yet.
They have milked 14 nm for what it is worth in the long term in the interest of the short term.
They bent over backwards for cloud providers and offered them special deals that helped finance the cloud providers transitioning to own silicon. They fused off features to create false product "differentiation" like the IBM of old and failed to deliver technology after technology in working form (SGX, TSX, 10nm, ...) They held the performance of the PC platform back by trying to capture all of the BoM for a PC. (e.g. tried to kill off NVIDIA and ATI with integrated 'graphics')
Customers are angry now, that's their problem. Intel is like that Fatboy Slim album, "We're #1, why try harder?" They still think they are the #1 chipmaker in the world but now it is more like #2 or #3.
I think Intel has learned a painful lesson on resting on their Laurels. That being said, Jim Keller was there for 2 years (he resigned June 12th, so I'd bet they have some big things on the horizon, namely GPUs.
It's worth noting that this is based on ARM's Neoverse N1 IP, which is also used in the AWS Graviton2. The Graviton2 benchmarks damn close to the best AMD and Intel stuff, so this chip looks very promising . It's really looking to be a breakthrough year for ARM outside of the mobile market.
Phoronix paints a very different picture, especially in non-synthetic workloads. Gravitron2 looks like a nice speedup over the first generation but either the optimization isn't there yet or there are areas which need additional work to become more developer/HPC competitive. That said, I'm thrilled we have competition in the architecture space for general purpose compute again.
Disclosure: I work for AWS on cloud infrastructure
My personal opinion is that the Phoronix way places quantity over quality. Measuring performance is an important part of shining a light on where we can improve the product, but I get little practical information from those numbers, even when they are reported as non-synthetic. There are HPC workloads that are showing significant cost advantages when run on C6g, like computational fluid dynamics simulations. See .
I expect the scalability of HPC clustering to improve on C6g in the future, like C5n improved cluster scalability compared to C5 with the introduction of the Elastic Fabric Adapter. The Phoronix and Openbenchmarking.org approach doesn't give much insight into workloads like this.
My advice for an audience like folks on HN is is to test it for yourself. For me, being able to run my own experiments is how I come to understand infrastructure better. And the cloud lowers the barrier of running those experiments significantly by being available on-demand, just an API call away. I'd love to hear what you think, either in a thread here or you can contact me via addresses in my user profile.
Didn't go too deep into it, but the AMD cpus being compared are different. Anandtech has an AWS-only EPYC 7571 (2 socket, 32 cores each, 2.5ghz), Phoronix has EPYC 7742 (1 socket, 64 cores, 2.2ghz). On top of that, Anandtech is using another AWS ec2 instance and Phoronix is testing on a local machine on bare metal.
Still would be interesting to know what differences caused the gap in results, but their setups were pretty different.
Ugh, yes; one of the perks of an Intel monoculture was that at least you only had one target to worry about, and inter-generational quirks were mostly limited to minor things. Now we have to deal with "this was optimized for (Intel|AMD) and doesn't work on (AMD|Intel)" and "the devs tested this on their x86 laptops and then it got weird when we went to run it on ARM" and "ARM is less of a platform and more of a collection of kinda-similar-looking systems that are mostly compatible". Don't get me wrong, I'll take this over a monoculture, especially an Intel monoculture, but there are some bumps on the road to a more diverse ecosystem.
In my experience, the Arm ecosystem has an excellent track record regarding compatibility across conforming implementations of the architectures (e.g., Armv7-A, ARMv8-A). I can draw a practical comparison to MIPS, where I had to deal with a lot of variability based on various vendor extensions. This is reflected in the "-march=" documentation for GCC:
Does anyone have an evaluation board for these things? Their marketing materials scream "scam" to me. For one thing they compare to competing x86 parts by arbitrarily downrating them to 85% of their actual SPECrate scores. Why? Then they switch baseline x86 chips when making claims about power efficiency ... for performance claims they use the AMD EPYC 7742 then for performance/TDP they use the 7702, which has the tendency to make the AMD look worse because it is spending the same amount of power driving its uncore but it's 11% slower than the 7742.
Also, without pricing, all these efficiency claims are totally meaningless.
This reminds me of Tilera, who had a 64 core mesh connected CPU ten about ten years ago. The problems seemed to be it was harder to optimize due to the mesh connectivity (like NUMA but multidimensional), low clock speeds, and lack of improvement after an initially promising launch.
Will this be the same? It seems possible. Does it really get more work done per watt than x86?
And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?
It depends a bit on how you utilize these CPUs. A lot of server software is optimized for just a few cores. Even products optimized for using more than 1 thread tend to be tested and used mostly with 4/8 core configurations. And then of course there are a few popular server-side languages that are effectively single threaded typically (e.g. python) and use multiple processes to leverage multiple cores. Launching 80 python processes on an 80 core machine may not be the best way to utilize available resources compared to e.g. a Java process with a few hundred threads.
With non blocking IO and async processing that can be good enough but to fully utilize dozens/hundreds of CPU cores from a single process, you basically want something that can do both threading and async. But assuming each core performs at a reasonable percentage of e.g. a Xeon core (lets say 40%) and doesn't slow down when all cores are fully loaded, you would expect a CPU with 80 cores to more than keep up with a 16 or even 32 core Xeon. Of course the picture gets murkier if you throw in specialized instructions for vector processing, GPUs, etc.
That would be the same in Python too. A problem is that you can't share the kernel pages for the code, and you need to have a shared-cache. Probably 0-mem-copy with no deserialization example: lmdb + Flat Buffers.
Nicer is to have 1/2x cores, but each core being 2x faster ;)
It's the opposite. Running lots of poorly optimized processes allows you to amortize memory latency. If your software suffers from cache misses then it's not going to run out of memory bandwidth any time soon. Adding more threads will increase memory bandwidth utilization. Meanwhile hyper optimized AVX512 code is going to max out memory bandwidth with a dozen cores or less.
That's really not true. Memory bandwidth, just like memory capacity becomes a bottleneck when it is exceeded, but more doesn't automatically speed anything up. Java and python programs will likely be hopping around in memory and waiting on memory to make it to the CPU as a result.
Typically only multiple cores running optimized software that will run through memory making heavy use of the prefetcher will exceed memory bandwidth.
AIUI the relevant weakness of Java here is that it typically has worse memory density and locality than something like Rust.
Consider code which linearly goes through a list of points in 2D space and does some calculation on the coordinates.
In Rust, the list is a Vec<(f64, f64)>. The Vec is a small object containing a pointer to a large block of data which contains all the points packed tightly together. Once the program has dereferenced the pointer and loaded the first point, all the others are immediately after it in memory, in order, containing nothing but the coordinates, and so the processor's cacheing and prefetching will make them available very quickly.
In Java, the list is an ArrayList<Point2D.Double>. The ArrayList is a small object containing a pointer to an array of pointers to more small objects, one for each point. Each of the small objects has a two-word object header on it. The pointer plus header means that for every two words of coordinate, there are three words of overhead, so the cache is used much less effectively. The small objects aren't necessarily anywhere near one another in memory, or in order, so prefetching won't help.
There are a couple of ways the Java situation can be improved.
Firstly, today, you can replace the naive ArrayList<Point2D.Double> with a more compact structure which keeps all the coordinates in a single big array. This gives you the same efficiency as Rust, but requires programming effort (unless you can find an existing library which does it!), and may give you an API that is less efficient (if it copies coordinates to objects on retrieval) or convenient (if it gives you some cursor/flyweight API).
Secondly, in the future, the JVM could get smarter. In principle, it could do the above rewriting as an optimisation, although i wouldn't want to rely on that. A good garbage collector could bring the small objects together in memory, to improve locality a bit.
Thirdly, in the near-ish future, Java will get value types  which behave a lot more like Rust's types. That would give you equally good density and locality without having to jump through hoops.
> And why does the article say "These Altra CPUs have no turbo mechanism" right below a graphic saying "3.0 Ghz Turbo"?
These chips obviously have variable clock speed, but apparently nothing like the complicated boost mechanisms on recent x86 processors. My guess is that Turbo speed here is simply full speed, and doesn't depend significantly on how many cores are active, and doesn't let the chip exceed its nominal TDP for short (or not so short) bursts the way x86 processors do.
These chips are practical and can go into servers that are similar in performance to x86 servers.
ARM has well-thought out NUMA support, probably a system this size or larger should be divided into logical partitions anyway. (e.g. out of 128 cores maybe you pick 4 to be management processors to begin with).
Products like this show that Apple could have an ARM based Mac Pro in two years relatively easily. They already have PCIe Gen 4. TDP and memory capacity is already more than intel provides in the Xeon workstation line that they use.
It's not just heavy investment at their moment of need, it's also a habit of nurturing unlikely Plan B's for years and decades.
- When Apple founded ARM in 1990 with Acorn and VLSI, they didn't know that silly cacheless chip would become a world-beating juggernaut. But hey, as founders, they now have a license to mold the microarchitecture however they like.
- When Apple bought NeXT in 1997, they didn't know the sun was slowly setting on PowerPC. But they secretly nursed along the (already built) NeXTSTEP x86 port for years, until the time came to dust it off and start shipping product with Intel Inside.
- When Apple forked KHTML in 2001 and started building WebCore/WebKit, they didn't know that MS was about to leave Internet Explorer to wither on the vine, nor release the final Mac IE only 2 years later. But they quietly invested in building such a konquering (sorry!) product that (with Google/MS help) we're now at risk of an entirely different browser monoculture.
A small historical correction: the ARM3 (first ARM with a cache) predates the spin-off of ARM as a separate company. The Acorn Archimedes A540 (with ARM3) was released mid 1990 and ARM was founded later that year.
If they do that, I wonder whether it would make sense for Apple to get into the ARM server CPU business while they'are at it.
Currently, the Intel Xeon is used in both high-end workstations and servers. If one x86 design can be suitable for both of those, presumably one ARM design could do the same.
If they could sell server CPUs at a profit, then Apple could get more return on its design investment by getting into two markets. And they'd get more volume. Though apparently they'd be facing competition from Ampere and Amazon's Graviton.
I've wondered for a long time if it would make sense for Apple to sell the A13 etc. to smart home device makers, on the theory that Apple can offer great HomeKit integration as well as a superior chip to anything else on the market (for e.g. video processing).
The other person who replied to you probably paid half or a third of what you would pay for an equivalent Mac Pro.
The profit margins on the Mac Pro are just incredible. (Yes, I'm sure that equivalent professional workstation brands also have huge profit margins... no, that doesn't make me want to pay those lofty prices more.)
The only real value the Mac Pro provides is that it's the most powerful computer you're allowed to run macOS on legitimately. If you can do your work from Windows (with WSL) or Linux, you can save upwards of tens of thousands of dollars by building your own workstation, and that workstation can be significantly more powerful than any current Mac Pro at the same time.
For video professionals who rely on FCPX or similar macOS-only software, they don't really have a choice, and they get the opportunity to essentially pay $10k to $20k just for a license of macOS, which is fun.
I have a hackintosh (i7-8700k) and it feels about 2x faster than the top spec $4000 macbook pro latest 2019 model (subjective opinion ofcourse). It is such a huge difference, especially when using PyCharm and Adobe apps.
It is pricey but if it is something you want to buy for 5 years, it is about $100/month cost. Some people might want to buy it.
"The other person who replied" here. While it definitely costs a lot less - you also need to factor in the time you spend on selecting components, building and tweaking thermals. It's almost a small side hobby for a month or two.
Well.. It does have [TDP] of 280 watts. So if you ran it at full capacity all the time - it'd be roughly equivalent to 300 watts heater. But at that time you'd probably be more worried about the fan noise (which really depends on how you build it). Most of the time my machine is at very light load. Case exhaust is slightly warmer than ambient, but that's not a good metric. Unfortunately I do not have a power meter.
I'd say heat is not a concern, but noise can be. It takes some time to figure out good fan curves to balance cpu temp vs noise. There may be some companies who do pre-built and well configured machines, but I haven't researched that at all.
Thermals have a subtle but huge effect on my room temp. My three devices of Macbook Pro, LG Ultrafine 5K and BenQ LED desklight together seem to add an extra 5-10 degrees to my small enclosed room temp. under load.
> Where Graviton2 is designed to suit Amazon’s needs for Arm-based instances, Ampere’s goal is essentially to supply a better-than-Graviton2 solution to the rest of the big cloud service providers (CSPs).
So the question is whether they can land Google, Microsoft, and/or Alibaba as customers for an alternative to AWS M6g instances.
I'm interested to know what applications really scale to these core counts. When I was working with large datasets (for finance) other bottlenecks tended to dominate, not computation, so memory pressure, and throughput from the SAN were more important.
These high density configurations were key when rack space was at a premium, but these days, power is the limitation, so this is interesting to provide more low power cores, i'm just not sure who is going to get the most benefit from them though...
With 80 cores I can get 40 2-core VMs all pegging their CPUs on a single processor without any core contention. Multiply up by the number of sockets. That might be the more interesting application for cloud providers than going for a single use case for the entire box.
Where this might get interesting, depending on how the pricing stacks up, is that if you're in the cloud function business, this will increase the number of function instances you can afford to keep warmed up and ready to fire. In those situations you're not bottlenecked on the total bandwidth for the function itself (usually), your constraint is getting from zero to having the executable in VM it's going to run in, and from there getting it into the core past whatever it's contending with. If there's nothing to contend with and it's just waiting for a (probably fairly small) trigger signal, execution time from the point of view of whatever's downstream could easily be dominated by network transit times.
Aesthetics is a big thing - rackmount servers are ugly and, unless there are panels covering it, they are horrendous deskside workstations.
Another one is noise. These boxes are designed for environments where sounding like a vacuum cleaner is not an issue. Because of that, they sound like vacuum cleaners, with tiny whiny fans running at high speeds instead of more sedated larger fans and bigger heat exchangers.
HP sold, for some time, the ZX6000 workstation that was mostly a rack server with a nice-looking mount. If someone decided to sell that mount, it'd solve reason #1, at least.
Probably a lack of time resources for unbounded experimentation with unsupported configurations of expensive, non-mainstream hardware. Not all of us have the luxury to be a recreational sysadmin in our spare time.
I struggle to imagine what you expect from running a desktop OS on an 80 core ARM cpu if it doesn't involve becoming a recreational sysadmin. That's definitely bleeding edge territory no matter the form factor the hardware ships in.
I'd have to budget it first. I already took over my partner's desk space in our home office and it wouldn't be fair to allocate too much physical space to my gadgets. There is already a quite massive x86 Lenovo tower server under my side of the desk that gets a pass because it's where her Mac makes Time Capsule backups.
> deliver significant cost savings over other general-purpose instances for scale-out applications such as web servers, containerized microservices, data/log processing, and other workloads that can run on smaller cores and fit within the available memory footprint.
> provide up to 40% better price performance over comparable current generation x86-based instances1 for a wide variety of workloads,
From what I read, it's not terribly hard to tell your compiler to compile for a particular instruction set, you just need to do it. Cost savings and better performance are great incentives, as well as Apple moving their Mac platform to it will drive more market share for developers to take the time to recompile.
It might or might not be hard to compile for a different cpu. Intel lets you play fast and loose with mutil threaded code without as many race conditions. As a result code that works fine on Intel often randomly gives wrong results on arm. Fixing this can be very hard.
Once it is fixed you are fine. Most of the big programs you might use are already fixed. Some languages give you gaurentees that make it just work.
What is different on intel since you can play fast and loose with multi threading?
Two threads reading and writing the same memory area without and locking would give problems regardless of the ISA or am i missing something?
Two threads reading and writing to the same memory area do not necessarily give problems. In fact, many software is built to exploit several facts about how memory accesses work with respect each other.
ARM processors give very few guarantees, so code has to workaround that.
Disclosure: I work at AWS on build cloud infrastructure
It's good to be skeptical. I always encourage folks do experiments using their own trusted methodology. I believe that the methodology that engineering used to support this overall benefit claim (40% price/performance improvement) is sound. It is not the "benchmarketing" that I personally find troubling in industry.
To be fair, naming things is a pain. It's the same problem we have naming software/services (i.e. the neverending "Show HN"/launch posts with comments "this name conflicts with the following multiple other things").
Am I the only one who is super-annoyed at having to figure out everytime if this is Ampere the company or Ampere the new nVidia line?
I mean it's probably not the fault of either, and a huge coincidence we're getting a flurry of news articles about both in summer of 2020, but come'on (can we have some kind of edits in the titles of HN posts to make the distinction clear?).
The thing that has me bearish on cpu manufacturers in general... From what I understand parallel architectures vastly simplify the overall schematics of CPUs in general, while retaining the power-saving benefits.
As we approach the critical velocity (supply / demand) for parallel architectures, the prospects of bootstrapping a CPU manufacturing company will become extremely feasible. IMO currently it's mostly the specialized knowledge needed to design CPUs that keeps this mostly out of reach today.
I'm no expert, just have an interest in the space, so any dissenting opinions / facts welcome.
Phones are built for performance per watt. Phones are benchmarked. In the context of a discussion on Apple introducing ARM chips into the Macbook line, performance per watt is far more meaningful. For most users, battery life is the issue once minimum performance criteria have been met.
Will there be Razor laptops that last less than an hour on battery that can beat them? Sure.
Will there be people who complain that the Mac isn't fast enough when plugged in? Already happening: the recent Macbook Pros have had complaints about thermal throttling, that obviously slightly larger Dell with a decent fan doesn't have.
But Apple will build performance laptops, using ARM chips, and they will be faster than the equivalent Intel Macbooks if only because they aren't throttled.
It is a Reduced Instruction Set computer. It's a greatly simplified design.
The x86_64 ISA is absolutely insane. The only way to implement it in hardware efficiently is to "compile" the super complicated instructions into micro-ops which can actually be decoded and executed on the CPU.
Said another way, Intel has to implement a compiler in hardware which compiles the machine code before it gets executed. The extra complexity means more power and less performance.
> The x86_64 ISA is absolutely insane. The only way to implement it in hardware efficiently is to "compile" the super complicated instructions into micro-ops which can actually be decoded and executed on the CPU.
> Said another way, Intel has to implement a compiler in hardware which compiles the machine code before it gets executed. The extra complexity means more power and less performance.
This is a sadly prevalent misconception.
There is no "compiler in hardware". There are two kinds of instructions; the simpler ones (which are the most common) are expanded directly into a fixed sequence of micro-ops, while the more complicated ones act like a subroutine call to the microcode. The closest software analogue would be a macro assembler, not a compiler.
AFAIK, the extra complexity for efficiently decoding x86 instructions comes mostly from their variable length without an explicit length indication, and from the variable number of prefixes which can change the interpretation of the following byte, which makes decoding an instruction a serial task. IIRC, both Intel and AMD have a couple of tricks to reduce the impact this has on both power and performance: caching the already decoded micro-ops, and storing the instruction boundaries in the L1 instruction cache.
So, that's the freshman-year CS view of the topic, but back here in reality land the "complicated" x86 instruction format has pretty much destroyed all others and none of the supposed advantages of RISC actually exist. Remember that the whole point of RISC is that the CPUs would supposedly run faster. That hasn't happened. There are no RISC CPUs running faster than state-of-the-art x86 CPUs. POWER8 comes closest, but does not exceed.
The whole RISC philosophy was a huge mistake. Yes, x86 instructions do not map well to transistors, and they have to be unpacked into uops to be executed. This is a form of compression. Having a compressed program image turns out to be a massive advantage. RISC proponents thought that x86 was so complicated they could beat Intel with their simple instruction decoders. That almost, but not really, made sense in 1990 but since then has made increasingly less sense, until today where the amount of sense this makes has hit zero. The x86 instruction decoder is a very small part of the floor plan of a modern CPU and every time they rev the microarchitecture it gets smaller. The number of transistors needed to decode the VEX prefix is like a speck of sand on the beach of a 512x512-bit multiplier.
> The whole RISC philosophy was a huge mistake. Yes, x86 instructions do not map well to transistors, and they have to be unpacked into uops to be executed.
The RISC philosophy wasn't a mistake. Our architectures have just become more sophisticated so that we don't have to make a binary choice. The hybrid is good. The internal uops get the pipelining advantages of RISC, while we get the encoding compression of a CISC instruction set.
I don't think that's the entire reason. Most of the common x86 instructions occurring in a normal program can be decoded to a few uops in a straightforward way, and since Sandy Bridge the decoded uops are cached anyway.
So this would only be a significant bottleneck for hot loops that are large enough that they don't fit in the uop cache.
It's definitely a real issue but it seems wrong to pin all or even most of Intel's stagnation on that.
If you look at the latest ARM instruction sets they are not really "RISC". Sure, they're much saner than than the crazy legacy instructions that x86 carries, but still nowhere near a "real RISC" ISA as espoused by Hennessy and Patterson, the hallmarks of which are simple, orthogonal, atomic instructions, and a small number of them. Currently that is most closely embodied by RISC-V.
If you look at how you can get computing gains going forward after the end of Moore's law, of course the glib answer is "parallelize across more cores!" but the more interesting path is to notice that behemoth single cores like x86 spend a ton of silicon area trying to optimize straight-line execution with things like speculative execution. If you saved all that silicon area, making each single core slower but smaller, but packed more cores on the die as a result, you would most likely come out faster. 
Does TSMC have the capacity to support AMD / AWS / Ampere etc making a significant dent in the server market alongside longstanding commitments to Apple etc?
Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?
TSMC never had capacity problem. Which mainstream media likes to run the story. You dont go and ask if TSMC has another spare 10K wafer capacity sitting around. TSMC plans their capacity based on their client's forecasting and projection many months in advance. They will happily expand their capacity if you are willing to commit to it. Like how Apple was willing to bet on TSMC, and TSMC basically built a Fab specifically for Apple.
This is much easier for AWS since they are using it themselves with their own SaaS offering. It is harder for AMD since they dont know how much they could sell. And AMD being conservatives meant they dont order more than they are able to chew.
>Given how much they spend on Intel CPUs to what extent is it worth AWS / Oracle etc making low hundred million dollar investments in their own silicon or startups like Ampere just to keep Intels pricing competitive?
I am not sure I understand the question correctly. But AWS already invested hundreds of millions in their own ARM CPU called Graviton.
They did with Intel Custom Foundry. They tried and they failed. And they currently have no intention to try that again. At least not until they admit defeat. Which is going to take at least another few years if not longer.
>They did with Intel Custom Foundry. They tried and they failed.
From what I've heard, they didn't try very hard. Apparently they thought all they had to do was make chips, and that the sheer "technical superiority" of their process meant that they could treat their customers as second-class stakeholders, withhold information about their production timelines, etc.