Beyond Moore's Law: Leveraging Advanced Packaging Technology

Cătălin Ciobanu, Transilvania University of Brașov (UnitBV)







# Moore's Law

- Prof. Carver Mead popularized the phrase "Moore's Law"
- Moore's law is the observation that the number of transistors in an integrated circuit (IC) doubles about every two year"
- Several factors contributing to this exponential behavior (1975 IEEE International Electron Devices Meeting)
  - Metal-oxide semiconductor (MOS) technology
  - The exponential rate of increase in die sizes coupled with a decrease in defective densities
  - **G** Finer minimum dimension

#### Transilvanii University of Brasov Moore's Law: The number of transistors on microchips doubles every two years Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years

Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important for other aspects of technological progress in computing – such as processing speed or the price of computers.



OurWorldinData.org – Research and data to make progress against the world's largest problems. Licensed under CC-BY by the authors Hannah Ritchie and Max Roser. Sursa: https://en.wikipedia.org/wiki/Moore%27s law



## Beyond Moore's Law

### □New challenges in semiconductor industry – multiple constraints in advanced chip design

- **Power** limitations
- **Thermal** limitations
- Limited Instructions Per Clock (IPC) gains in new generations of processors

### Industry **trends**

- □ Multicore designs
- □ Wide use of accelerators

### □Wafer costs for each new technology node are rising

### □ Monolithic designs are prohibitively expensive

□ Lower yields inherent to large chips

### Reticle limit for future High-NA (Numerical Aperture) Extreme Ultraviolet (EUV) Lithography will be halved

□ 856mm<sup>2</sup> -> 429mm<sup>2</sup> due to the use of an amorphous lens array<sup>1</sup>



## CELL Broadband Engine (2006) - Heterogeneous Multicore

- Dever Processor Element (PPE) for control tasks
- □ Synergistic Processor Elements (SPE) for data-intensive processing







# Solution – Advanced packaging - Chiplet technology

"Chiplet technology is a microelectronics design and manufacturing approach where multiple smaller dies or chiplets are combined into a single package, with each of chiplets performing a specific function."

### □ Multiple chiplets – connected to form a SoC-like solution

Disaggregate large designs

□ Avoid the silicon reticle size limitation

### □ Multiple **advantages** when using chiplets

- □ Flexibility
- □ Scalability
- □ Cost savings off-the-shelf chiplets
- □ Mixing functions **from different process nodes**



# **Chiplets Motivation**



### **Chiplets Motivation**



Source: https://www.tomshardware.com/news/amd-rdna-3-gpu-architecture-deep-dive-the-ryzen-moment-for-gpus



### Moore's Law Keeps Slowing

# **Barriers to Large Cache Size**



Source: IEEE ISSCC 2022



Transilvania University of Brasov



Source: www.techpowerup.com/287291/amd-announces-ambitious-goal-to-increase-energy-efficiency-of-processors-running-ai-training-and-high-performance-computingapplications-30x-by-2025 (Sep 29th 2021)



Application-Specific optimization, Modular design: chiplets + 3D Stacking



Source: www.techpowerup.com/287291/amd-announces-ambitious-goal-to-increase-energy-efficiency-of-processors-running-ai-training-and-high-performance-computingapplications-30x-by-2025 (Sep 29th 2021)

11



# **Chiplet Technology**



## Chiplet Technology and Heterogeneous Integration

- □ Transition from layout of laminate substrates to silicon substrate
- Electrical and thermal analysis challenges
- Multi-chip Modules (MCM) date back to the 1960s
- □ Sytem in Package (SiP) began to replace the term MCM in the late 1990s
- 2.5D IC packaging Silicon substrates – high density, using through-silicon vias (TSVs)



Figure 1: Heterogeneous integration



### **Chiplet Technology Challenges**



Figure 2: System-level electrothermal analysis



# Chiplets Heterogeneous Integration (HI)

□ Similar to designing a small PCB

Each chiplet built with a common/known communication interface

D PCle

HBM

🗖 AIB

□ Lower development cost – modular integration

Lower manufacturing costs – purchasing known-good die (KGD)

Cost advantage – volume manufacturing when reusing the same chiplets in many designs

□ Many vendors exploring this space

- □ Intel CO-EMIB EMIB + Foveros in the same package
- □ Intel Omni-Directional Interconnect (ODI) horizontal communication (similar to EMIB) or vertically using TSVs (similar to Foveros)
- □ TSMC's Chip-on-Wafer-on-Substrate (CoWoS)





### Advanced Packaging Technology Evolution



Figure 4: Evolution of advanced multi-chip(let) packaging technologies



# Industry Shift Towards Multi-Die SoCs

□Shift fueled by several converging trends

- □ Some SoCs too big for manufacturability
- □ Some SoCs require different process nodes for optimal cost/perf
- $\hfill\square$  Desire for enhanced product scalability and composability
- □Lack of design ecosystem pause/postpone multi-die projects
- Early adopters developed proprietary die-to-die interfaces
  - □ Limits the ability to assemble dies from different verndors

### □Solution: standardized die-to-die interconnects

- **Optical Interface Forum (OIF)** The XSR and USR physical layer specifications optimized for die-to-die connectivity
- □ Chips Alliance The AIB specification which was originally introduced by Intel
- **Open Compute Platform (OCP)** The OpenHBI and Bunch-of-Wires (BOW) specifications optimized for different use cases
- □ Unified Chiplet Interconnect Express (UCle) –A comprehensive die-to-die interconnect specification covering multiple use cases and a complete protocol stack



### Die-to-die interconnect standards

| Alliance               | OIF                             | Compute<br>Project*              |                                                    | • CHIPS<br>• ALLIANCE | UCIE<br>Universal Chiplet<br>Interconnect Express     |
|------------------------|---------------------------------|----------------------------------|----------------------------------------------------|-----------------------|-------------------------------------------------------|
| Standard               | XSR                             | BOW                              | ОНВІ                                               | AIB                   | UCle                                                  |
| Data Rate              | 112G / 224G                     | 8G / 16G                         | 8G / 16G                                           | 6G                    | 16G / 32G                                             |
| Protocol               | Not Defined                     | Not Defined                      | Not Defined                                        | Not Defined           | Streaming,<br>PCle, CXL                               |
| Package Types          | 2D                              | 2D, 2.5D                         | 2D, 2.5D                                           | Bridge                | 2D, 2.5D, Bridge                                      |
| Target<br>Applications | Optical Networking<br>(CPO/NPO) | Cost sensitive aggregation cases | High density scale and split cases for data center | Mil-aero ecosystem    | Scale & Split w/ streaming<br>Aggregation w/ PCle/CXL |

Source: https://www.synopsys.com/designware-ip/technical-bulletin/ucie-multi-die-socs.html



## UCIe: Companies Join Forces – standardized die-to-die interconnect



Source: <u>https://www.synopsys.com/designware-ip/technical-bulletin/ucie-multi-die-socs.html</u> Synopsys Multi-Die SoCs Gaining Strength with Introduction of UCIe



## UCIe: a Complete Stack for Die-to-die interconnect

### □Supports data rates from 8Gbps/pin 16Gbps/pin

□ Expected to support 32GBps/pin

### □ Supports all types of package technology

- □ UCle for advanced packages (silicon interposer, silicon bridge or RDL fanout)
- UCIe for standard packages (organic substrate or laminate)
- Both options share the same architecture and protocols

### □ Very competitive performance advantages to multi-die SoC designers

- □ High energy efficiency (pJ/b)
- □ High edge usage efficiency (Tbps/mm)
- □ Low latency (ns)

### □ UCle – compelling roadmap

- Higher data rates
- New protocols
- 3d packaging
- Security
- Testability





# Brief History of Multi Chip Modules



## Intel Pentium Pro Multi-Chip Module (MCM) (1995)

Pentium Pro is packaged in Multi-Chip Module (MCM) – "on package cache"

- □ Separate L2 die in the package, same speed as CPU core
- □ The dies are connected to the package using conventional wire bonding
- □ Up to two 512KB cache dies
- $\hfill\square$  0.35  $\mu m$  to 0.50  $\mu m$







# Cray X1 (2003)

8-chip Multi-Chip Module (MCM)

- □ 4 processor chips
- □ 4 custom streaming cache chips





Source: CPU Galaxy - CRAY X1 Multi Chip Module - Monster CPU with 8 Cores – Teardown, Feb 18, 2023 <u>https://www.youtube.com/watch?v=QdtcJTIcqDE</u> <u>https://www.craysupercomputers.com/downloads/CrayX1/CrayX1E\_Datasheet.pdf</u>



### Intel Pentium D Presler (2006)

Smithfield Pentium D 800 series had a single die with two cores

- □ Two single-core dies placed on the same substrate
- □ Reduced cache available to each core
- 90nm, 206 mm<sup>2</sup>



Presler Pentium D 900 series used a Multi-Chip Module (MCM)

- □ Two single-core dies placed on the same substrate
- Each die could be sold as a Pentium 4 CPU
- Reduced cost due to higher yields
- $\Box$  65nm, 162 mm<sup>2</sup> for both cores





# AMD Zen Embraces Chiplets, 3D Vcache



## AMD EPYC Naples (2017)

□ Four 14nm compute dies, each with 8 cores

□ Each die connected directly to the others via Infinity Fabric

### **INFINITY FABRIC: DIE-TO-DIE INTERCONNECT**

- Fully connected Coherent Infinity Fabric within socket
- Optimized low power, low latency MCM links between die
- 42GB/sec bi-dir BW per link, ~2pJ/bit TDP
- Single-ended, low power zero transmission
- Infinity Control Fabric connected between dies

Purpose-built MCM links optimized for power, bandwidth, and latency





#### 72 AMD EPYC EMBARGOED UNTIL JUNE 20<sup>TH</sup> AT 3:00 PM CENTRAL U.S. TIME

Source: <u>https://www.techpowerup.com/233381/amd-announces-high-performance-computing-platform-naples-is-epyc</u> <u>https://en.wikipedia.org/wiki/Epyc</u> https://www.anandtech.com/show/11551/amds-future-in-servers-new-7000-series-cpus-launched-and-epyc-analysis/2



## AMD 2nd Generation EPYC Rome (2019)

2nd gen. EPYC with Zen2 architecture

- 9 7m dies
- □ 8 7nm Complex Core Die (CCD) chiplets, 8 cores each
- 🛛 A 14nm IO die
- □ Connected via second-gen infinity fabric





### AMD Zen2 Rome Chiplets



# Figure 2.2.1: Three heterogeneous technology chiplets leveraged to many products and markets.

Source: IEEE ISSCC 2020 AMD Chiplet Architecture for High-Performance Server and Desktop Products



## AMD Zen2 Cost-performance scalability

Cost vs. Performance Scalability



Figure 2.2.2: Cost-performance scalability with chiplet design.



### AMD Zen2 Infinity On-Package (IFOP) SerDes Architecture



Figure 2.2.3: Infinity Fabric On-Package (IFOP) SerDes architecture.



# AMD Zen2 Infinity On-Package (IFOP) SerDes Architecture

"Rome"



### Figure 2.2.4: 'Rome' and 'Matisse' package design and IOD leverage.

Source: IEEE ISSCC 2020 AMD Chiplet Architecture for High-Performance Server and Desktop Products



### AMD Naples vs. Rome



Figure 2.2.6: 'Rome' central IOD reduces the number of NUMA domains and distances for much improved memory latency attributes relative to it's predecessor.



### AMD Zen4 line-up

# **Product Configurations**





### AMD Ryzen Zen4 IO die





## 2.5D chiplets, 3D chiplets

# More than Moore





- 2.5D chiplets can provide product flexibility and reduce cost
- However, <u>3D can be even better</u>!
  - Improves effective memory latency
  - Reduces long datapath and I/O's dynamic powers
  - Fits more transistors within a given package cavity size

\*Hypothetical processor with large cache

© 2022 IEEE International Solid-State Circuits Conference

26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU



## AMD Ryzen Zen3 3D V-cache

# AMD 3D V-Cache<sup>™</sup> Components: CCD



- "Zen 3" x86-64 CPU Core Complex Die (CCD)
- TSMC 7nm technology
- 8 cores per Core Complex (CCX)
- 32MB shared L3 Cache
- +19%<sup>1</sup> IPC (Ave) vs. "Zen 2"
- 81mm<sup>2</sup>
- <u>AMD 3D V-Cache<sup>™</sup> support</u> integrated from Day 1

SEE ENDNOTES: R5K-003

© 2022 IEEE International Solid-State Circuits Conference

Source: IEEE ISSCC 2022



#### AMD Ryzen Zen3 3D V-cache

## AMD 3D V-Cache<sup>™</sup> Components: L3D



|                                |                                        | C. There |
|--------------------------------|----------------------------------------|----------|
|                                |                                        |          |
| Ce                             | - Q                                    |          |
| a c                            | a o                                    |          |
| ÷                              | ÷.                                     |          |
| <b>O</b>                       | 0                                      |          |
| nterfa                         | Interface                              |          |
| Internet and the second of the | COLUMN DESIGN DESIGNATION OF THE OWNER |          |
| Signal                         | <b>.</b>                               |          |
| 2                              | g                                      |          |
| .27                            | <u>S</u>                               |          |
| S S                            | Signal                                 |          |
|                                |                                        |          |
| <u> </u>                       | <u> </u>                               |          |
| - U                            | Ŭ                                      |          |
|                                |                                        |          |

- AMD 3D V-Cache<sup>™</sup> extended <u>L3</u> <u>D</u>ie (L3D)
- TSMC 7nm FinFET Technology
- 13 layers Cu + 1 layer Al metal stack
- 64MB L3 Cache Extension
- 41mm<sup>2</sup>

© 2022 IEEE International Solid-State Circuits Conference

26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU

Source: IEEE ISSCC 2022



#### AMD 3D V-cache Interface

## Micro Bump vs. Hybrid Bond



Micro Bump 3D







Micro Bump 3D



- Compared to Micro Bump 3D solutions, Hybrid Bond offers
  - >15x interconnect density
  - >3x interconnect energy efficiency
  - Superior thermal conductance

SEE ENDNOTES: EPYC-027 C4 and Micro Bump 3D illustrations are hypothetical [1] Swaminathan, Hot Chips Tutorial, 2021

© 2022 IEEE International Solid-State Circuits Conference

26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU

15 of 36



#### AMD 3D V-cache Interface

#### Without 3D With 3D Stacking Stacking -----CCD CCD IOD IOD Support Silicon CCD

## **Server Configurations**

■ AMD 3<sup>rd</sup> Gen EPYC<sup>™</sup> Server CPU

- Up to 8 "Zen 3" CCDs
- 1 I/O Die (IOD)
- AMD 3<sup>rd</sup> Gen EPYC<sup>TM</sup> Server CPU with AMD 3D V-Cache<sup>TM</sup>
  - Up to 8 thinned CCDs + L3Ds
  - Support silicon added to match 2D CCD Z-height
- Both designs compatible with the same package

© 2022 IEEE International Solid-State Circuits Conference



#### AMD 3D V-cache Interface

## **Cache Interface Illustration**





#### AMD Hybrid-Bonded 64MB Stacked Cache

## **3D Signal Interface**



26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU

26 of 36



#### AMD 64MB Stacked Cache Performance

### **Server Performance**



© 2022 IEEE International Solid-State Circuits Conference

26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU

31 of 36



# Intel Tiles with EMIB and Foveros



#### Intel Lakefield (2020): Foveros Die Interface (FDI) die stacking



Figure 8.1.1: System partitioning with 3D stacking across compute and base die.



#### Intel Lakefield (2020): Foveros Die Interface (FDI) die stacking





#### Intel Sapphire Rapids – Multi-Tile Design



Source: https://hc33.hotchips.org/assets/program/conference/day1/HC2021.C1.4%20Intel%20Arijit.pdf



#### ...... 2000 1 1 UPI PCle Accel Accel Acce Core Core CHA, LLC CHA, LLC Men & Cores Mesh & Cores Mesh ..... Memory CHA, LLC CHA, LLC Memo -----& Cores Mesh & Cores Mesh . 1 5 Accel. Core Core Core 10 PCle PCle UPI Acce Accel. ................ ----

#### Intel Sapphire Rapids (2023)

Sapphire Rapids Accelerators
Intel QuickAssist Technology (QAT)

Faster compression and encryption

Intel Dynamic Load Balancer (DLB)

Load balancing
Queue management
Packer prioritization

In-Memory Analytics Accelerator (IAA)

In-memory databases
Big data analytics

- Data Streaming Accelerator (DSA)
  - □ High performance data copy and transformation



#### Intel Sapphire Rapids (2023)

### **Enabling the Intel® Accelerator Engines**

Tools for developers to take advantage and deploy today





#### Intel Sapphire Rapids (2023)

### A Higher Performance Server Architecture

Benefits of Intel<sup>®</sup> Accelerator Engines





#### Intel Sapphire Rapids – EMIB



Figure 2.2.7: Die photo of left and right die arranged in  $2\times 2$  quasi-monolithic configuration. EMIB placement highlighted in blue.

Source: ISSCC 2022 Sapphire Rapids: The Next-Generation Intel Xeon Scalable Processor



#### Intel Sapphire Rapids Multi-Die Fabric (MDF) IO



Figure 2.2.1: Die floorplan in 2×2 quasi-monolithic configuration. EMIB highlighted at die-to-die interfaces.

Source: ISSCC 2022 Sapphire Rapids: The Next-Generation Intel Xeon Scalable Processor



#### Intel Sapphire Rapids Cross-die timing model



Figure 2.2.2: A cross-die timing model, or a single-die loopback model.



# Chiplets-augmented GPUs



11

#### AMD RDNA3 – Chiplets for GPUs

|         |                              | C/                           |         |                          | N              | VO                                               | R              | <b>K</b> F  | OR (                | GRAP                            | HIC.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <b>S</b> ?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                       |
|---------|------------------------------|------------------------------|---------|--------------------------|----------------|--------------------------------------------------|----------------|-------------|---------------------|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|
| Т       | raditional                   | Monolith                     | ic      |                          | EPY            | C CPU Se                                         | rver           |             |                     |                                 | "N                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | lavi21" GPU                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                       |
| DDR DDR | I/O I/O<br>CCX<br>CCX<br>CCX | 1/0 1/0<br>CCX<br>CCX<br>CCX | DDR DDR | DDR 8 71111<br>DDR 8 CCD | 8 8 7nm<br>CCD | 1/0 1/0<br>1/0 1/0<br>14⊤m<br>1/0 Die<br>1/0 1/0 | 8 7nm<br>8 ccD | ADD See Sum | 100's of<br>signals | 10's of<br>1000's of<br>signals | Biological and a second | Constraint of the second       | A TELE SALANA A TELES                 |
| R       | ccx                          | ccx                          |         | 8 8 7nm<br>CCD           | 8 8 Zum        | 1/0 1/0                                          | 8 8 7nm<br>CCD | 8 8 CCD     |                     |                                 | Bind Company and And<br>Bind Company and And<br>Shidder Ent                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Are the second s | NE SILVENOS/INSELSE<br>INFINITY CACHE |

- Chiplets enabled use of advanced nodes where they benefit CPU performance but mature nodes for IO and interfaces
- High speed organic package links meet CPU Bandwidth requirements

- GPU shader engines require massive amounts of connectivity compared to CPUs
- A different approach is required





## AMD RDNA3 – Graphics Compute Die and Memory Cache Die

Each MCD has 16MB of cache





12

#### AMD RDNA3



Source: https://www.tomshardware.com/news/amd-rdna-3-gpu-architecture-deep-dive-the-ryzen-moment-for-gpus

together we advance\_gaming



#### **AMD RDNA3 Chiplets**





#### CHIPLET TECHNOLOGY INFINITY FANOUT LINKS BANDWIDTH DENSITY

Infinity Links, operating at 9.2Gb/s with High Performance Fanout provide almost 10X the BW density of the IFOP links used in Ryzen and EPYC

Enables industry-leading peak bandwidth of 5.3TB/s

Transilvania University of Brasov

**Bandwidth Density** 

12.0x









#### **RDNA3** Infinity Links





#### AMD Instinct MI300



The world's first integrated data center CPU + GPU

# • AMD INSTINCT<sup>M</sup>

Breakthrough architecture to power the exascale AI era



#### AMD Instinct MI300

# AMD INSTINCT™ MI300

#### The world's first data center integrated CPU + GPU





24 Leadership Data Center CPU cores 146B Transistors

HBM3

128GB

3D Advanced Chiplet Packaging



#### AMD Instinct MI300



- MI300 a disaggregated design
- □ Multiple TSMC 5nm chiplets
- □ 3D Stacking to place them over a base die
- □ On-package High Bandwidth Memory (HBM)



#### Intel Ponte Vecchio



Intel Ponte Vecchio

- □ Intel 10nm process
- Die Size of 1280mm2



## Normal die Thermal de HBM HBM HBM Xe Link

#### Intel Ponte Vecchio

| W<br>0B<br><b>47 functional</b> + 16 thermal Tiles)<br>x 62.5 mm (4844 mm <sup>2</sup> ) |
|------------------------------------------------------------------------------------------|
| 47 functional + 16 thermal Tiles)                                                        |
|                                                                                          |
| x 62.5 mm (4844 mm <sup>2</sup> )                                                        |
| x 62.5 mm (4844 mm <sup>2</sup> )                                                        |
|                                                                                          |
| atforms                                                                                  |
| 590G SERDES, 1x16 PCle Gen5                                                              |
| 0 mm² Si                                                                                 |
| 0 mm² Si footprint                                                                       |
| -11 (24 layers)                                                                          |
| .5D connections                                                                          |
| $m\Omega R_{path}/tile$                                                                  |
| 8 pins                                                                                   |
| mm <sup>2</sup> x4 cavities                                                              |
|                                                                                          |

#### s Figure 2.1.7: Ponte Vecchio chip photographs and key attributes.



#### Intel Ponte Vecchio



RAMBO = Random Access Memory, Bandwidth Optimized

#### □ 47 Tiles + 16 thermal tiles

- □ 16 compute tiles, TSMC N5, 2.6TB/s speeds to the chip fabric
- 8 tiles for RAMBO cache, Intel 7, 15MB per tile, 1.3 TB/s connection
- □ 2 Foveros base tiles, Intel 7
- □ 2 Xe-Link tiles, TSMC N7
- 8 HBM2e tile
- 11 Intel's embedded multi-die interconnect bridge (EMIB) tiles
- The package has 4844mm<sup>2</sup> with 4468 pins

□ 2330mm<sup>2</sup> of silicon for the 47 tiles

□ Fully Integrated Voltage Regulators (FIVR)

Compute Express Link (CXL) interface

#### Figure 2.1.1: 3D and 2D system partitioning with Foveros and EMIB on PVC.



S

#### Intel Ponte Vecchio Foveros and EMIB



Figure 2.1.2: Process details for Foveros and EMIB.



#### Intel Ponte Vecchio Die-to-die IO



Figure 2.1.4: PVC clocking, die-to-die IO and comparisons.



### Intel GPU Max 1550 (2023)

Intel GPU Max 1550

- □ 600W TDP
- □ 128GB of HBM2e, 1024 bit interface
- □ Memory bandwidth: 3.27 TB/s



Source: <u>https://www.techpowerup.com/gpu-specs/intel-ponte-vecchio.g1046#:~:text=Intel's%20Ponte%20Vecchio%20GPU%20uses,12%20(Feature%20Level%2012\_1)</u>. https://ark.intel.com/content/www/us/en/ark/products/232873/intel-data-center-gpu-max-1550.html



#### Conclusions

- Industry has shifted towards chiplets
- **UCIe creates a solid foundation for chiplet-based designs**
- □Future chips may be monolithic 3D ICs
- Chiplets solve several issues of current chips, beyond Moore's Law Reticle limit
  - Better yields due to smaller silicon area of a single silicon die
  - □ Multi-vendor integration
- Universities need to introduce chiplet-based design in the curricula

# Thank you! Questions?