Artificial intelligence is appearing everywhere, but normal CPUs are not efficient platforms for the technology. Efficiency will only come with a new breed of microprocessors that are engineered specifically for the AI workload of the 2020s.
Pioneering AI applications needed supercomputer-class compute resources in order to produce the desired outputs. The downside was that these were often run on massed banks of multi-purpose central processing units (CPUs) working in parallel, but not optimised to the specific processing functionality that AI needs to perform optimally.
More recently, graphics processing units (GPUs) have become established as de facto AI co-processor accelerators. Unlike conventional CPUs with four-to-eight complicated cores designed to tackle computational calculations in sequential order (even when they have multiple cores to offload work to), GPUs have many more simple cores – hundreds, even thousands – with dedicated VRAM memory, so are adept at handling statistical computation (i.e. floating point arithmetic) and the massively parallel processing needed for progressive machine learning applications.
These attributes have proved highly providential for GPU vendors, most notably Nvidia, which has leveraged demand for AI-optimised GPUs to establish market leadership. The company continues to develop additional connective capability for the kind of dataflow that benefits AI workloads. GPUs are also manufactured at volume, which helps make them more affordable. However, serviceable as they may be, GPU design did not start with AI’s purposes in mind.
What AI systems developers looked for were new processors engineered and optimised specifically for AI jobs, many-cored processors with built-in parallelism, able to perform intelligent analysis of big datasets in real-time, all on a highly localised architecture that is closely networked with co-located processors so that data can be transferred between them with near-zero latency, which would help keep energy consumption down. AI-specific chips are sometimes categorised as ‘AI accelerators’.
AI chips produced to date do not conform to an industry standard, but are usually manycore-based designs, and focus generally on low-precision arithmetic, novel dataflow architectures, or in-memory computing capability.
There is no standard definition of an AI chip. A broader view, attributed to Beijing Innovation Centre for Future Chips (ICFC), is that any build of microprocessor used for AI applications can be called an AI chip. Furthermore, the ICFC notes, some chips based on traditional computing architectures combined with hardware and software acceleration schemes have worked well with AI applications. However, many unique designs and methods for AI applications have emerged that cover all levels of microprocessor build, from materials and devices, to circuits and architectures.
Nvidia sells a lot of its Tesla-class GPUs to cloud services providers (CSPs). The big four CSPs – Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure and Alibaba Cloud – deployed Nvidia GPUs in 97.4 per cent of infrastructure-as-a-service offerings sold with dedicated accelerators, according to Lift Insights.
However, the AI-driven escalation in demand for GPUs has alerted a mixed spectrum of technology providers to the potential for compute engines inspired by GPU concepts, but engineered for AI from the outset. Some contenders are aimed at servicing AI in cloud, others at AI in devices at the ‘edge’ of the hardware stack. These providers foresee an opportunity to win some of the $91.18bn (£70.7bn) that Allied Market Research thinks AI chip sales will be worth by 2025, as a result of a stonking 45.2 per cent compound annual growth rate.
McKinsey, meanwhile, proffers a more conservative valuation, and reckons that by 2025 AI-related processors could account for almost 20 per cent of all microprocessor demand, worth about $67bn (£52bn). Even sliced thin, that’s a pretty fruity cake, with no shortage of players hungry for a slice.
McKinsey also forecasts that GPUs will lose market share to AI-specific processors, by about 50 per cent by the middle of the 2020s. “Opportunities [for AI chips] will emerge at both data centres and the edge,” McKinsey adds. “If this growth materialises, semiconductor vendors will be positioned to capture more value from the AI technology stack than they have obtained with previous innovations – about 40-50 per cent of the total.”
AI creates an unprecedented opportunity for chip vendors due to “its applicability across virtually every industry vertical, the strong forecast for the sheer number of chips needed both in the cloud and at the ‘edge’, and the growing need for specialised computing requirements to accelerate new algorithms”, explains PwC’s ‘Global Semiconductor Market’ report. The consultancy predicts that AI will be the catalyst for another growth cycle for the semiconductor sector: “The need for instant computing, connectivity and sensing, will drive massive demand for AI-tailored processors for the next decade.”
Last year (2019) saw a succession of new AI chips from a variety of high-tech providers that ranged from incumbent microprocessor vendors (like AMD, Intel, IBM, Qualcomm) to CSPs (AWS, GPC, Alibaba) and investor-funded start-ups (like BrainChip, Cerebras, Graphcore, Groq and Gyrfalcon Technology). This upsurge has been described by some commentators as a new ‘chips arms race’, as multiple players jockey for a stake in the emergent market.
One of the few defining characteristics of the nascent AI chip sector at this stage is that barriers to entry seem low, provided an arriviste has sufficient investor confidence. According to Scott Runner, VP of technology at Altran, the level of investment and R&D being committed to AI chip development presents a rare opportunity for the start-ups to thrive in plain sight of entrenched market leaders.
“Some of the AI application needs are too niche for a large microprocessor player to target, or else require such domain-specific knowledge that a start-up company can clearly focus and differentiate,” Runner says. “AI is ideal – start-ups don’t have to spread themselves thin, can solve one application vertically or implement one architecture horizontally.”
“People have been using existing CPUs and GPUs to get more arithmetic compute to move the agenda forward, but what is needed is a completely new type of processor to support a fundamentally different workload,” says Nigel Toon, co-founder and CEO of AI chip company Graphcore. “[What we really need is] many processors running one task, rather than one processor running many tasks.”
‘Machine Intelligence workloads are something completely new. For that reason a new kind of processor architecture is required.’
For now, the AI chip sector is “not one of those races that the winner wins on Moore’s Law by transistor scaling where each new process node is much more expensive than the past”, Runner adds. “Architectures implemented in fairly standard, ‘affordable’ fabrication processes can deliver remarkable results, dependent on the architecture and application.”
Runner’s reference to Moore’s Law is apposite. The limits of physics dictate that its famously stated axiom that the number of transistors per integrated circuit would double about every two years will not apply indefinitely.
As chip physics are shrunk down to the scale of 1nm (equivalent to around 10 atoms), it becomes tricky to regulate the flow of electrons that constitute the 0s and 1s that a microprocessor stores or processes. Even in the presence of a potential barrier, an electron flow continues due to quantum tunnelling, the quantum-mechanical phenomenon where a subatomic particle passes – ‘leaks’ – through a barrier, which renders some conventional processor architectures less efficient.
This is one reason why chip architectures are being rethought by vendors like Intel and Graphcore along neuromorphic lines – architectures inspired by the interconnected structure of the biological brain. The neuromorphic chip can realise the interconnection between arbitrary neurons. That is, under a given-scale biomimetic neural network, any neuron can transmit data to any other neuron.
To leverage the complex interconnection, the neuromorphic chip is designed in a hierarchical way. It includes array cores, which have crossbars; an on-chip network; and a high-interconnection I/O link. The way data is trafficked within the chip bears similarities with packetised datacoms networks. Data to be transmitted needs to carry the target address information, and packets are transmitted on the shared link. Neuron-level transmissions have the arbitrariness of addresses, so each neuron destination address is different.
“For 70 years we have told computers what to do, step by step, in a software program. We are now moving from programmed algorithmic to machine intelligence systems that can learn,” says Toon at Graphcore. “These machine learning workloads are completely new: structures that have many separate parameters and many compute tasks operating on those parameters. That creates massive parallelism – which means you need highly parallel processors that can talk to each other and share problems. That’s something that’s not supported by today’s CPUs and GPUs. A new kind of processor architecture is required.”
The differentiation between a data centre and an ‘edge’ device is another compelling dynamic of the AI chip sector, as some chip vendors see potential for AI processing to occur on an endpoint system itself – a smartphone, sensor unit or remotely located facial-recognition platform, say – rather than have to wait for outputs to be shunted to and from a cloud.
Despite – or because of – the opportunities AI chips promise, contestants in this market face formidable challenges as they endeavour to bring their solutions to market. They must consolidate technological credentials, win over AI solutions developers, and persuade AI practitioners that they should run their applications and workloads on platforms powered by their particular AI chipsets.
So far, contrary to general trends of the CPU market, AI chips have been largely predicated along proprietorial lines: vendors have created chips engineered to their own specific design with less concern for direct functional compatibility with rival products. Each product launch has included claims about achievable performance, some gauged in terms of the IPS (inferences per second) performance metric rather than FLOPS (floating-point operations per second).
Like-for-like comparisons carry less traction where solutions claim to optimise a specific AI workload, states Andrew McCullough, AI technology expert at PA Consulting.
“The most important benchmark for AI chip performance depends on the application,” adds McCullough. “Overall, speed tends to be the critical quality but for some edge devices power efficiency is just as important. Mobile devices fall into this category when AI processing has to be implemented on the edge device itself, rather than away in the cloud.”
Established chipmakers might also find it too taxing to make a full-blown break with the past, McCullough adds. “They tend to have intellectual property back catalogues wedded to a particular programming paradigm. There comes a point where starting from scratch can produce a better solution due to a step-change in technology.”
This market re-assignment has resulted in unusual industry moves and shakes. Some of the innovative AI chip propositions have come from vendors with short track-records in the microprocessor sector – start-ups like BrainChip, Cerebras, Graphcore, and Gyrfalcon Technology – yet with the credibility to attract funding from proven technology players who are convinced by what they are buying into.
Graphcore, for instance, closed an additional $200m (£155m) funding round in 2019, which brings the total capital raised by the UK-based company to more than $300m (£233m); investors include Microsoft and BMW; the company has been valued at $1.7bn (£1.3bn).
At the same time AI chip solutions have been announced by large technology providers who also have negligible history as microprocessor specialists. Indeed, the polymathic Google announced that it would not only install its self-minted Tensor Processing Unit (TPU) AI chips in its own GCP data centres, it will also produce a version designed to perform AI tasks on ‘edge’ devices. Other chipmakers investigating potentials for ‘edge’ AI chips include Apple, ARM and Synopsys.
The division between chips designed to operate within data centre infrastructures and at the ‘edge’ of the hardware stack informs another aspect of the market differentiation that has started to shape the AI chip sector. For example, as with GCP, Amazon’s Inferentia chip (December 2019) has also been primarily installed in its AWS data centres to provide AI-enabled services for operational use cases and virtual environments for AI developers.
Inferentia is not a direct competitor to big AI chip incumbents or the start-ups because Amazon will not be selling the chips to other parties. It does, however, deprive Intel and Nvidia of a major customer for AI-purposed chips. AWS expects to sell services that run on the chips to its cloud customers starting this year.
Moving from central processing to intelligence processing
Graphcore’s Intelligence Processing Unit (IPU) is an example of a basic neuromorphic AI chip architecture. It was designed specifically for machine learning workloads, and so differs significantly from CPU and GPU architectures.
The design aim of the IPU is the efficient execution of ‘fine-grained’ operations across a relatively large number of parallel threads. It offers true multiple instruction-multiple data parallelism and has distributed, local memory as its only form of memory on the device, apart from the register file.
Each IPU has 1,216 processing elements called tiles; a tile consists of one computing core plus 256KiB [kibibytes, 1024 bytes] of local memory.
In addition to the tiles, the processor contains the exchange, an on-chip interconnect that allows high-bandwidth, low-latency communication among tiles.
Each IPU also contains ten IPU link interfaces. The IPU link is a proprietary interconnect that enables low-latency, high-throughput communication between IPU processors. These links make transfers between remote tiles as simple as between local tiles, so are essential for scalability.