# openRole: Can we bring 'Design once, run everywhere' to FPGAs?

# Burkhard Ringlein & cloudFPGA Team Zurich **IBM Research Europe**

FPL Workshop on DevOps support for Cloud FPGA platforms, 2020-09-04



# FPGAs are spreading in DCs and Clouds...

- ...that's why we are here
- Further Examples:
  - AWS FPGA instances
  - Microsoft's Project Brainwave
  - OC-Accel/Snap
  - PC<sup>2</sup>, Galapagos, cloudFPGA, etc...
- Consequences:
  - Cloud and Datacenter are typically multi-user environments
    - $\rightarrow$  Constrains the architecture
  - Usually, not all app developers want to deal with the I/O details of specific boards
    → Vendors provide platforms





Reconfigurable/FP Performance Com

Home | Agenda | Submissions | Program Committee

Fifth International Workshop on Heterogeneous High-performance Recor

Sunday, November 17, 2019 (9:00am to 5:30pm) Denver, CO

Held in conjunction with: SC19: The International Conference for Hig



|                                                                                                |                                      |                                                                                              |                                                                                              | ~                                                  |                 |
|------------------------------------------------------------------------------------------------|--------------------------------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|----------------------------------------------------|-----------------|
| DRM                                                                                            | 🐟 inaccel                            | PRODUCTS 👻                                                                                   | SOLUTIONS - RESOURCES -                                                                      | DOCS CONTACT -                                     | GET STARTED     |
| CONTROL CODE AI HPC ENTER                                                                      |                                      |                                                                                              |                                                                                              | Python"                                            |                 |
| e New Normai → HPC                                                                             |                                      |                                                                                              | InAccel<br>Cluster<br>Manager                                                                |                                                    |                 |
| And The Cloud                                                                                  | InAccel FF                           | GA orchestrator                                                                              | InAccel<br>Runtime                                                                           | ý<br>java                                          |                 |
| FPGA AND THE CLOUD                                                                             |                                      | ted Deployment,                                                                              |                                                                                              | inaccel                                            |                 |
| [1                                                                                             |                                      | g and Resource<br>ent of FPGA clusters                                                       | Drivers                                                                                      |                                                    |                 |
|                                                                                                |                                      | est it on your premise                                                                       | Kernels<br>FPGAs<br>Server                                                                   |                                                    |                 |
|                                                                                                | <b>©</b> Te                          | est it on your browser                                                                       |                                                                                              | V.                                                 |                 |
|                                                                                                |                                      |                                                                                              |                                                                                              |                                                    | [17]            |
|                                                                                                |                                      |                                                                                              |                                                                                              |                                                    |                 |
|                                                                                                | Acceleration made simple             |                                                                                              |                                                                                              |                                                    |                 |
| The International Conference for High Perform<br>Computing, Networking, Storage, and Analysis  |                                      |                                                                                              |                                                                                              | elopers Support                                    | About           |
| comparing, recording, storage, and Anarysis                                                    |                                      | Advantages Solutions                                                                         | Cloud to On-Pren                                                                             |                                                    |                 |
|                                                                                                |                                      |                                                                                              |                                                                                              |                                                    |                 |
|                                                                                                |                                      |                                                                                              |                                                                                              |                                                    |                 |
|                                                                                                | [12] Built for Any Server, Any Cloud |                                                                                              |                                                                                              |                                                    |                 |
|                                                                                                | [12]                                 | Built f                                                                                      | or Any Serv                                                                                  | er, Any Clou                                       | d               |
| PGA Clusters for Hial                                                                          | la al                                |                                                                                              | or Any Serv                                                                                  | Ĩ                                                  | 5.0             |
| PGA Clusters for Higl                                                                          | la al                                | Deploy anywhe                                                                                | ere – from the private d                                                                     | Ĩ                                                  | _               |
|                                                                                                | la al                                | Deploy anywhe                                                                                | ere – from the private d                                                                     | ata center to the public                           | _               |
|                                                                                                | la al                                | Deploy anywhe<br>Deploy anywhe<br>On-Premis<br><b>THENEXTPLAT</b><br>HOME COMPUTE STORE CONN | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE                                    | ata center to the public<br>Cloud<br>AI HPC ENTERF | cloud [3        |
|                                                                                                | la al                                | Deploy anywhe                                                                                | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE<br>19 To Happen To The FPGA → COMP | ata center to the public<br>Cloud<br>AI HPC ENTERF | cloud [3        |
| nputing                                                                                        | h                                    | Deploy anywhe<br>Deploy anywhe<br>On-Premis<br><b>THENEXTPLAT</b><br>HOME COMPUTE STORE CONN | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE<br>by To Happen To The FYGA > COMP | ata center to the public<br>Cloud<br>AI HPC ENTERF | cloud [3        |
| nputing                                                                                        | la al                                | Deploy anywhe                                                                                | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE<br>by To Happen To The FYGA > COMP | ata center to the public<br>Cloud<br>AI HPC ENTERF | cloud [3        |
| onfigurable Computing (H <sup>2</sup> RC'19)                                                   | h<br>[13]                            | Deploy anywhe                                                                                | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE<br>by To Happen To The FYGA > COMP | ata center to the public<br>Cloud<br>AI HPC ENTERF | cloud [3        |
| onfigurable Computing (H <sup>2</sup> RC'19)                                                   | h<br>[13]                            | Deploy anywhe                                                                                | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE<br>by To Happen To The FYGA > COMP | ata center to the public<br>Cloud<br>AI HPC ENTERF | cloud [3        |
| onfigurable Computing (H <sup>2</sup> RC'19)<br>igh Performance Computing, Networking, Storage | h<br>[13]<br>and Analysis            | Deploy anywhe                                                                                | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE<br>by To Happen To The FYGA > COMP | ata center to the public<br>Cloud<br>AI HPC ENTERF | cloud [3        |
| onfigurable Computing (H <sup>2</sup> RC'19)                                                   | h<br>[13]<br>and Analysis            | Deploy anywhe                                                                                | ere – from the private d<br>e<br>FORM<br>ECT CONTROL CODE<br>by To Happen To The FYGA > COMP | ata center to the public<br>Cloud<br>AI HPC ENTERF | rise Hyperscale |

# Providing a platform: The Shell Role Architecture (SRA)

- Split the FPGA design into:
  - a platform specific part: **SHELL**
  - an application specific part: **ROLE**
- The SRA design pattern offers:
  - I/O abstractions
  - different privilege levels
  - improves platform security (especially in combination with partial reconfiguration)
- Consequently: SRAs are used frequently (by all major Cloud vendors)
- SRAs are "the APIs of FPGA app developers"



## Some examples of SRAs

#### OC-Accel (OpenCAPI) [2]



#### cloudFPGA [14]



#### AWS [1]

#### Shell Interfaces

The following diagram and table summarize the various interfaces between the Shell and CL as defined in cl ports.vh.



# Some more examples of SRAs

#### Alveo cards [7]



#### Galapagos FPGA Hypervisor [5]



#### Baidu FPGA Cloud server [15]



B. Ringlein / cloudFPGA Team / cFDevOps Workshop May 2020 / © 2020 IBM Corporation

#### Hardware Sandboxes (Florida) [16]









# Xillybus (kind of) [9]

#### ...and counting...

# The Problem: Every vendor provides a different interface



AWS F1 SDK **Xilinx Vitis** environment

#### **Intel's oneAPI**

Microsoft Sandpiper (?)

FPGA app developer

#### OC-Accel



- FPGA application development is tightly coupled to interfaces (i.e. "I/O")
- to achieve low latency / high throughput: run time behavior and interfaces *must be considered* at design time (stalls, request latency, bursts...)
- $\rightarrow$  FPGA applications are barely interface agnostic
- $\rightarrow$  application depends on Shell interfaces ("APIs")
- → limited re-usability & portability

## What would be a better solution?

- **limited (or no) portability** of FPGA designs is another "roadblocker" in the way of FPGAs
- (good) reasons for change of platforms:
  - elasticity, scaling
  - evolution of platforms
  - Cloud vendor change, business decisions, etc.
- re-usable Roles would greatly accelerate the adaption of new platforms
  → and consequently, the growth of the ecosystem

#### with **portable / reusable Roles**:





# Once upon a time...

- late 1970s: many disparate versions of OSs ullet
- **POSIX: Portable Operating System Interface**
- Originated 1988 to maintain compatibility ightarrowbetween OSs
- A POSIX compliant application can be compiled and deployed on every POSIX compliant OS, without code changes
- Latest update: IEEE Std 1003.1-2017 ightarrow
- Linux, macOS & Android are POSIX-compliant (but extend the API with custom calls) → a (well-designed) SW application can be ported effortless between these platforms
- (Windows Kernel offers a POSIX compatibility layer) ullet



[4]

POSIX = code once, run everywhere, if you have a compiler

## What does POSIX bring?

- POSIX defines APIs and data structures for OS system calls (among other things)
- Any program that:
  - applies the POSIX API of the structs
  - uses a programming language that offers a POSIX compliant compiler (for the target platform)
  - $\rightarrow$  can be compiled and executed on any **POSIX compliant Operating System** without further code changes
- POSIX also specifies the commands and flags to invoke the compiler (e.g. cc or c99 etc.)
- POSIX does not provide an implementation for the compilers or the system calls

man 3p sendto \$ SENDTO(3P)

SYNOPSIS #include <sys/socket.h>

> ssize\_t sendto(int socket, const void \*message, size\_t length, int flags, const struct sockaddr \*dest\_addr, socklen\_t dest\_len);

DESCRIPTION

NAME

The sendto() function shall send a message through a connection-mode or connectionless-mode socket. If the socket is connectionless-mode, the message shall be sent to the address specified by dest\_addr. If the socket is connection-mode, dest\_addr shall be ignored.

**RETURN VALUE** 

Upon successful completion, sendto() shall return the number of bytes sent. Otherwise, -1 shall be returned and errno set to indicate the error.

ERRORS

. . . . . . . . . . .

. . . . . . . . . .

**POSIX Programmer's Manual** 

sendto - send a message on a socket

The sendto() function shall fail if:

# Can we do this for FPGAs too?

# Can we bring "design once, run everywhere" to FPGAs? → the **openRole** proposal



# SRAs have already common concepts

- schema
- Besides configuration & control registers, there are usually (one of) two communication channels:
  - address based: PCIe, via Memory (AXI4 Full)
  - stream based: network or PCIe abstraction (AXI4 Stream)
- Stream- and address-based communication can abstract every fabric (PCIe, CXL, Ethernet, Infiniband, etc.)
- Define an interface for this level of abstraction that is valid for all kind of platforms

• usually FPGA applications exist not alone: See as part of a (complex) application communication

# Design flow with **openRole**

- allow the **compilation/synthesis** of any Role for any compliant Platform
  - it is *not* limiting any platform or application implementation
  - it is *not* limiting any platform specific optimization
- $\rightarrow$  enables portable FPGA designs
  - SRAs have already common concepts
  - design flows are actually automated and similar across platforms and vendors
  - $\rightarrow$  we have to agree on a **common interface**
- it is about portable FPGA designs, not porting Software to FPGAs



#### **FPGA** PR region ROLE user FPGA application config & ctrl registers interrupts network memory (on-board / off-board) (AXI Interconnect) (AXI Interconnect) memory AXI4 Lite 4x AXI4 stream AXI4 stream 2x AXI4 full (visible to (data & meta) the user) (debugger & (debugger & (event logger) profiler) profiler) SHELL

entity is **oR Role** is port ( piClk : in std ulogic; std ulogic; : in piRst Configuration & Ctrl Registers AXI4-Lite -----biOR control AXI AWVALID : in std ulogic; biOR\_control\_AXI\_AWREADY : out std ulogic; biOR control AXI AWADDR std ulogic vector (15 downto 0); : in ---- Input AXI-Write Stream Interface ----siNetwork\_Data\_tdata std\_ulogic\_vector( 63 downto 0); : in siNetwork Data tkeep std ulogic vector( 7 downto 0); : in siNetwork\_Data\_tvalid : in std ulogic; ); end **oR\_Role;** 

# openRole: ideas for a standard – FPGA side

- solely based on AMBA AXI4
- **one** defined interface ("API")
- bus-width adaption done with AXI Interconnects ("Template parameter")
- Interface includes:
  - stream based and address based communication
- - control registers (and virtual interrupts)
- The use of a standard AXI-based interface allows the straightforward integration of:
  - debug probes
  - performance profilers

# openRole: ideas for a standard – CPU side

- every FPGA Role/app needs some control by the platform or communication with some SW app
- The app developer is only interested in the interface between SW app and FPGA Role, *not* how the data gets there
- We need also an unified SW interface, e.g.:
  - oR\_configure(...)
  - oR\_write & oR\_read (address based)
  - oR\_send & oR\_receive (stream based)
- All *Roles* should be identified with an integer **role\_id** (not PCIe address, or IP address, etc.) to have common *meta data*

oR\_return oR\_write(oR\_role\_id role\_id, const oR\_word \*data, oR\_address start\_address, oR\_len length);

oR\_return oR\_read(oR\_role\_id role\_id, oR\_word \*data, oR\_address start\_address, oR\_len length);

oR\_return oR\_send(oR\_role\_id role\_id, const oR\_word \*data, oR len length);

oR\_return **oR\_recv**(oR\_role\_id role\_id, oR\_word \*data, oR\_len length);



oR\_return **oR\_configure**(oR\_role\_id role\_id, const oR\_word \*config\_to\_write);

# Summary: The openRole proposal

- Enable **"design once, run everywhere"** for heterogeneous CPU-FPGA platforms
- **NOT** an implementation, it's a standard for abstracting configuration and communication from a particular HW
- **NOT** a programming model, it is about portable FPGA designs
- requires a "compiler" for each platform
- but does not require changes to the code
- published as "Header Files", "VHDL entities" and "PDFs" by the community
- interested in shaping openRole? You would do it completely different?
  → contact us



# Appendix





### openRole: ideas for a standard – system view



#### **FPGA**





# openRole should **not** limit languages or tools

openRole is also **not** a programming model or a transport protocol

...it just connects arbitrary programs with arbitrary platforms.



connects any program with any platform

## References

[1] https://github.com/aws/aws-fpga/blob/master/hdk/docs/AWS\_Shell\_Interface\_Specification.md#ShellInterfaces [2] https://opencapi.github.io/oc-accel-doc/ [3] https://www.xilinx.com/applications/data-center.html [4] https://commons.wikimedia.org/wiki/File:Linux\_kernel\_API.svg [5] https://doi.org/10.1007/978-3-319-92792-3\_2 [6] https://commons.wikimedia.org/wiki/File:Dieric\_Bouts\_013.jpg [7] https://developer.xilinx.com/en/articles/acceleration-basics.html [8] US20190190847A1: Allocating acceleration component functionality for supporting services [9] http://xillybus.com/downloads/xillybus\_product\_brief.pdf [10] https://www.nextplatform.com/2020/01/13/on-the-spearpoint-of-fpga-and-the-cloud/ [11] https://www.nextplatform.com/2020/01/14/the-inevitability-of-fpgas-in-the-datacenter/ [12] http://sc19.supercomputing.org/proceedings/bof/bof\_pages/bof115.html [13] https://h2rc.cse.sc.edu [14] https://doi.org/10.1109/FPL.2019.00054 [15] https://cloud.baidu.com/doc/FPGA/s/Ajwvyh11e [16] https://doi.org/10.1109/HPEC.2019.8916526 [17] https://inaccel.com

All remaining images are from IBM DAM or IBM Websites or created by the author.

## cloudFPGA: Further Reading

- B. Ringlein, F. Abel, A. Ditter, B. Weiss, C. Hagleitner and D. Fey, "ZRLMPI: A Unified Programming Model for Reconfigurable Heterogeneous Computing Clusters" in 28th IEEE International Symposium On Field-Programmable Custom Computing Machines (FCCM), 2020.
- B. Ringlein, F. Abel, A. Ditter, B. Weiss, C. Hagleitner and D. Fey, "System architecture for network-attached FPGAs in the cloud using partial reconfiguration," in 29th International Conference on Field Programmable Logic and Applications (FPL), 2019.
- F. Abel, J. Weerasinghe, C. Hagleitner, B. Weiss, S. Paredes, "An FPGA Platform for Hyperscalers," in IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI), Santa Clara, CA, pp. 29–32, 2017.
- Weerasinghe, F. Abel, C. Hagleitner, A. Herkersdorf, "Disaggregated FPGAs: Network performance comparison against baremetal servers, virtual machines and Linux containers," in IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Luxembourg, 2016.
- J. Weerasinghe, R. Polig, F. Abel, "Network-attached FPGAs for data center applications," in IEEE International Conference on Field-Programmable Technology (FPT '16), Xian, China, 2016.
- J. Weerasinghe, F. Abel, C. Hagleitner, A. Herkersdorf, "Enabling FPGAs in hyperscale data centers," in IEEE International Conference on Cloud and Big Data Computing (CBDCom), Beijing, China, pp. 1078–1086, 2015.
- F. Abel, "How do you squeeze 1000 FPGAs into a DC rack?" online at LinkedIn
- The cloudFPGA project page at ZRL: https://www.zurich.ibm.com/cci/cloudFPGA/



B. Ringlein / cloudFPGA Team / cFDevOps Workshop May 2020 / © 2020 IBM Corporation

29