Bruce Jacob

University of Maryland

SLIDE

# All Tomorrow's Memories ... for multicore

## **Bruce Jacob Keystone Professor** University of Maryland





![](_page_1_Figure_1.jpeg)

Bruce Jacob

University of Maryland

![](_page_2_Picture_1.jpeg)

Bruce Jacob

University of Maryland

![](_page_2_Figure_6.jpeg)

![](_page_2_Picture_7.jpeg)

Bruce Jacob

University of Maryland

SLIDE 4

**Fine-Grained Access** Bandwidth Capacity Low Power Nonvolatility

# Wish List [& Talk Outline]

![](_page_3_Picture_6.jpeg)

Bruce Jacob

University of Maryland

SLIDE 4

**Fine-Grained Access** Bandwidth Capacity Low Power Nonvolatility

\*Things we did and/or are doing now (I'll cover in talk)

# Wish List [& Talk Outline] DRAM -HBM/HMC\*

![](_page_4_Picture_7.jpeg)

Bruce Jacob

University of Maryland

SLIDE 4

**Fine-Grained Access** Bandwidth Capacity Low Power Nonvolatility

\*Things we did and/or are doing now (I'll cover in talk)

# Wish List [& Talk Outline] DRAM -HBM/HMC\*

## Flash, 3DXP, ReRAM, PCM, etc - NVMM\*

![](_page_5_Picture_8.jpeg)

Bruce Jacob

University of Maryland

SLIDE 4

**Fine-Grained Access** Bandwidth Capacity Low Power Nonvolatility

\*Things we did and/or are doing now (I'll cover in talk)

# Wish List [& Talk Outline] DRAM -HBM/HMC\*

## Flash, 3DXP, ReRAM, PCM, etc - NVMM\*

![](_page_6_Picture_8.jpeg)

![](_page_7_Picture_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE 4

# Wish List [& Talk Outline] DRAM -HBM/HMC\*

## Flash, 3DXP, ReRAM, PCM, etc - NVMM\*

**Major implications** for OS and applications

\*Things we did and/or are doing now (I'll cover in talk)

![](_page_7_Picture_9.jpeg)

![](_page_8_Picture_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE 5

Off-chip: high speed SerDes and generic protocol

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_8_Picture_9.jpeg)

![](_page_9_Picture_0.jpeg)

All Tomorrow's

**Off-chip: high** speed SerDes and generic protocol

4 I/O Ports, up to 80 GB/s each

Next gen is **I 60** GB/s per (640 total)

![](_page_9_Picture_6.jpeg)

![](_page_10_Picture_0.jpeg)

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_10_Picture_5.jpeg)

![](_page_11_Figure_0.jpeg)

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_11_Picture_5.jpeg)

![](_page_12_Figure_0.jpeg)

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_12_Picture_5.jpeg)

![](_page_13_Picture_0.jpeg)

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_13_Picture_5.jpeg)

![](_page_14_Picture_0.jpeg)

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_14_Picture_6.jpeg)

![](_page_15_Picture_0.jpeg)

## Hybrid Memory Cube

## VAULT (channel)

## Logic Base (I/O & CTL)

Off-chip: high speed SerDes and generic protocol

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_15_Picture_8.jpeg)

![](_page_16_Picture_0.jpeg)

## Hybrid Memory Cube

## VAULT (channel)

Partitions (ranks)

## Logic Base (I/O & CTL)

Off-chip: high speed SerDes and generic protocol

4 I/O Ports, up to 80 GB/s each

Next gen is 160 GB/s per (640 total)

![](_page_16_Picture_9.jpeg)

![](_page_17_Figure_0.jpeg)

![](_page_18_Picture_0.jpeg)

## Each Link is 128 Bits Wide: 1024 Total

All Tomorrow's Memories

Bruce Jacob

University of Maryland

![](_page_18_Picture_6.jpeg)

![](_page_19_Figure_0.jpeg)

Bruce Jacob

University of Maryland

![](_page_19_Picture_6.jpeg)

![](_page_20_Picture_0.jpeg)

![](_page_20_Picture_2.jpeg)

![](_page_21_Picture_1.jpeg)

ΙΤΒ **NAND** Flash PCIe SSD (I/O)

**SDRAM** 

SSD \$500 - IOW

All Tomorrow's Memories

Bruce Jacob

University of Maryland

SLIDE 10

# A Tale of 3 Memory Systems

![](_page_21_Figure_9.jpeg)

![](_page_21_Picture_10.jpeg)

![](_page_22_Figure_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE 10

# A Tale of 3 Memory Systems

![](_page_22_Figure_6.jpeg)

# FTL – Flash

## **32G DRAM**

![](_page_22_Picture_9.jpeg)

![](_page_22_Picture_10.jpeg)

![](_page_23_Figure_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE II

![](_page_24_Figure_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE II

Bruce Jacob

University of Maryland

SLIDE 12

# High Bandwidth<br/>Non VolatilesNon VolatilesVCrossbar<br/>ReRAMMemory cell<br/>(Metal Ouida)

![](_page_25_Picture_5.jpeg)

![](_page_25_Picture_6.jpeg)

Cells minimum area (<u>no access transistor</u>)
2-stack arrays @ 16nm, 20 x 20 mm die: 64GB of ReRAM
8-stack arrays => 256 GB of ReRAM
Stacks arbitrarily high
No. Access. Transistor.

![](_page_25_Picture_8.jpeg)

## No Per-Bit Access Transistor

![](_page_26_Picture_1.jpeg)

All Tomorrow's Memories

Bruce Jacob

University of Maryland

SLIDE 13

# (n = 1 .. 2048) (n=8)

![](_page_26_Picture_7.jpeg)

![](_page_27_Picture_0.jpeg)

Bruce Jacob

University of Maryland

![](_page_28_Picture_0.jpeg)

(cores, controllers, routers, NoC, etc.)

All Tomorrow's Memories

Bruce Jacob

University of Maryland

Bruce Jacob

## University of Maryland Not Die Stacked

For n = 2048 area is ~75% white space

No Per-

## Use for processor (cores, controllers, routers, NoC, etc.)

![](_page_29_Picture_5.jpeg)

![](_page_30_Figure_0.jpeg)

![](_page_31_Figure_0.jpeg)

### **DRAM Architecture**

# Monolithic Integration

|                                       | DRAM                                                  | ReRA                                      |
|---------------------------------------|-------------------------------------------------------|-------------------------------------------|
| # Mem Controllers                     | 6                                                     | 1000                                      |
| LI Cache size                         | 32KB                                                  | 32KB                                      |
| L2 Cache size                         | IMB                                                   | IMB                                       |
| Mai                                   | n Mem Paramet                                         | ers                                       |
| Mem Latency                           | ~30ns<br>DRAMsim3<br>simulated                        | SST Mess<br>Read: 20<br>Write:            |
| request_width<br>(access granularity) | <b>64 Bytes</b><br>bus-width = 8B<br>burst-length = 8 | <b>8 Byte</b><br>bus-width<br>burst-lengt |
| Topology                              | Mesh                                                  | Mesh                                      |

![](_page_31_Figure_6.jpeg)

![](_page_32_Figure_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE 16

## cores

## memory access points

![](_page_32_Picture_7.jpeg)

![](_page_33_Figure_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE 17

## cores

## memory access points

![](_page_34_Figure_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE 18

## **STREAM: DRAM vs ReRAM comparison**

Number of Cores

![](_page_35_Figure_0.jpeg)

Bruce Jacob

University of Maryland

SLIDE 18

## STREAM: DRAM vs ReRAM comparison

## **Bottom Line** The hardware is here now (mostly) I-I0TB main memories will be common

- TB/s off-chip bandwidths are here now The costs: power and capacity
- Lower-power solution: monolithic (~TB, 100s GB/s, 1000s concurrent ops)
- The software is waiting to happen
- Combined VM+FS subsystems
- Journaled main memory, etc.

All Tomorrow's Memories

Bruce Jacob

University of Maryland

![](_page_36_Picture_11.jpeg)

# **Bottom Line**

20eed Speed

 $10^{1}$ 

10<sup>0</sup>

Simulation

Relative

All Tomorrow's Memories

Bruce Jacob

University of Maryland

![](_page_37_Figure_6.jpeg)

![](_page_37_Picture_7.jpeg)

# Shameless Plug

## www.memsys.io

# Washington DC Sep 28 – Oct I, 2020

All Tomorrow's Memories

Bruce Jacob

University of Maryland

SLIDE 21

![](_page_38_Picture_7.jpeg)

# MEMSYS 2020

The International Symposium on Memory Systems 🚸 Sep 28-Oct 1, Washington DC

### **Important Dates**

Submission: 29 May, 2019 (+7 days)\* Notification: 31 July, 2019 Camera-Ready: 14 August, 2019

mission extension

nats

### oers

nt, ACM 'sigconf' lind submission 16 pages long

### elists

Yitzhak Birk, Technion Petar Radojkovic, BSC

### **Organizers**

Bruce Jacob, University of Maryland Kathy Smiley, Memory Systems

Ameen Akel, Micron Abdel-Hameed Badawy, NMSU Jonathan Beard, Arm Bruce Childers, University of Pittsburgh

Memory-device manufacturing, memory-architecture design, and the use of memory technologies by application software all profoundly impact today's and tomorrow's computing systems, in terms of their performance, function, reliability, predictability, power dissipation, and cost. Existing memory technologies are seen as limiting in terms of power, capacity, and bandwidth. Emerging memory technologies offer the potential to overcome both technologyand design-related limitations to answer the requirements of many different applications. Our goal is to bring together researchers, practitioners, and others interested in this exciting and rapidly evolving field, to update each other on the latest state of the art, to exchange ideas, and to discuss future challenges. Please visit. memsys.io for more information.

### Areas of Interest

Previously unpublished papers containing significant novel ideas and technical results are solicited. Papers that focus on system, software, and architecture level concepts specifically memory-related, i.e. topics outside of traditional conference scopes, will be preferred over others (e.g., the desired focus is away from pipeline design, processor cache design, prefetching, data prediction, etc.). Symposium topics include, but are not limited to, the following:

- Memory-system design from both hardware and software perspectives
- Memory failure modes and mitigation strategies
- Memory and system security issues
- Memory for embedded and autonomous systems (e.g., automotive)
- Operating system design for hybrid/nonvolatile memories
- Technologies including flash, DRAM, STT-MRAM, 3DXP, etc.
- Memory-centric programming models, languages, optimization
- Compute-in-memory and compute-near-memory technologies
- Data-movement issues and mitigation techniques
- Interconnects to support large-scale data movement

emory-management techniques ogies, their controllers, and novel uses y level across datacenter applications eration of large-memory machines NoSQL stores

and memory technologies to support them, , and heterogeneous memories

on topics outside the scope of traditional be preferred over others.

### ntations

se interesting ideas that will spark e groups—to get applications ple, system architecture nd circuits people to talk to d abstracts, position ers, and each accepted e presentation time ll be published in the

The Westin on VA.

![](_page_38_Picture_43.jpeg)

## MEMSYS 2019 Attendees

![](_page_38_Picture_46.jpeg)

Bruce Jacob

University of Maryland

SLIDE 22

# Thank You! **Bruce Jacob** blj@umd.edu www.ece.umd.edu/~blj

![](_page_39_Picture_5.jpeg)

![](_page_39_Picture_7.jpeg)

![](_page_40_Figure_0.jpeg)

Bruce Jacob

University of Maryland

![](_page_41_Picture_0.jpeg)

Bruce Jacob

University of Maryland

![](_page_42_Figure_0.jpeg)