Interconnection Networks (Bus)

Ähnliche Dokumente
Interconnection Networks (Bus)

PCI - Peripheral Component Interface

Hardware PCI-Bus. Dr.-Ing. Matthias Sand. Lehrstuhl für Informatik 3 (Rechnerarchitektur) Friedrich-Alexander-Universität Erlangen-Nürnberg

Hardware PCI-Bus. Dr.-Ing. Matthias Sand. Lehrstuhl für Informatik 3 (Rechnerarchitektur) Friedrich-Alexander-Universität Erlangen-Nürnberg

Verbindungsnetzwerke - Operationsprinzipien

H o c h s c h u l e D e g g e n d o r f H o c h s c h u l e f ü r a n g e w a n d t e W i s s e n s c h a f t e n

auf differentiellen Leitungen

Eingebettete Taktübertragung auf Speicherbussen

Tube Analyzer LogViewer 2.3

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL USER GUIDE June 2016

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL - USER GUIDE June 2016

a) Name and draw three typical input signals used in control technique.

Magic Figures. We note that in the example magic square the numbers 1 9 are used. All three rows (columns) have equal sum, called the magic number.

VN7640 FlexRay/CAN/LIN/Ethernet Interface Quick Start Guide. Version 1.1 English/Deutsch

Word-CRM-Upload-Button. User manual

Unit 1. Motivation and Basics of Classical Logic. Fuzzy Logic I 6

Ressourcenmanagement in Netzwerken SS06 Vorl. 12,

rot red braun brown rot red RS-8 rot red braun brown R S V~

Bedienungsanleitung / Manual für il-debug_i Interface für den Debugger il_debug

Electrical tests on Bosch unit injectors

Industrial USB3.0 Miniature Camera with color and monochrome sensor

Microcontroller VU Exam 1 (Programming)

Systemaufbau Blockdiagramm / System structure

Cameraserver mini. commissioning. Ihre Vision ist unsere Aufgabe

Routing in WSN Exercise

yasxtouch Firmware Update

DPM_flowcharts.doc Page F-1 of 9 Rüdiger Siol :28

p^db=`oj===pìééçêíáåñçêã~íáçå=

EtherNet/IP Topology and Engineering MPx06/07/08VRS

Level 2 German, 2013

Walter Buchmayr Ges.m.b.H.

NEWSLETTER. FileDirector Version 2.5 Novelties. Filing system designer. Filing system in WinClient

HIR Method & Tools for Fit Gap analysis

Level 1 German, 2014

PROFIBUS-DP Repeater 1 to 1 and 1 to 5 with optional level converter module

Outline. Cell Broadband Engine. Application Areas. The Cell

Level 2 German, 2015

TSM 5.2 Experiences Lothar Wollschläger Zentralinstitut für Angewandte Mathematik Forschungszentrum Jülich

Finite Difference Method (FDM)

IDS Lizenzierung für IDS und HDR. Primärserver IDS Lizenz HDR Lizenz

Motion Controller 2 - MC2

GridMate The Grid Matlab Extension

Weather forecast in Accra

Level 2 German, 2016

Cell Broadband Engine

Number of Maximal Partial Clones

Newest Generation of the BS2 Corrosion/Warning and Measurement System

Order Ansicht Inhalt

Cycling and (or?) Trams

Electrical testing of Bosch common rail Injectors

Lehrstuhl für Allgemeine BWL Strategisches und Internationales Management Prof. Dr. Mike Geppert Carl-Zeiß-Str Jena

Analog GSM-Gateway TRF

Produktinformation Access-Gateway. Product information Access gateway AGW 670-0

Übung 3: VHDL Darstellungen (Blockdiagramme)

vcdm im Wandel Vorstellung des neuen User Interfaces und Austausch zur Funktionalität V

Ein Stern in dunkler Nacht Die schoensten Weihnachtsgeschichten. Click here if your download doesn"t start automatically

CA_MESSAGES_ORS_HDTV_IRD_GUIDELINE

CA_MESSAGES_ORS_HDTV_IRD_GUIDELINE

GL module Master Time Code, Timer and Time generator (LTC)

Pipelining for DLX 560 Prozessor. Pipelining : implementation-technique. Pipelining makes CPUs fast. pipe stages

Mock Exam Behavioral Finance

Electrical testing of Bosch common rail piezo injectors

TomTom WEBFLEET Tachograph

Wozu dient ein Logikanalysator?

Electrical testing of Bosch common rail solenoid valve (MV) injectors

iid software tools QuickStartGuide iid USB base driver installation

R&R. Ges. für Rationalisierung und Rechentechnik mbh. R&R RR-P-508 / ST106-EX1 motor-control unit

MATLAB driver for Spectrum boards

SAMPLE EXAMINATION BOOKLET

Computational Models

Accelerating Information Technology Innovation

Fundamentals of Electrical Engineering 1 Grundlagen der Elektrotechnik 1

1.2 QoS-Verbesserungen

Dexatek's Alexa Smart Home Skills Instruction Guide

Anschluss der NCR PC-8881 Doppelfloppystaton an einen "normalen" Computer (keinen NEC):

Use of the LPM (Load Program Memory)

FPGA-Based Architecture for Pattern Recognition

prorm Budget Planning promx GmbH Nordring Nuremberg

FACHKUNDE FüR KAUFLEUTE IM GESUNDHEITSWESEN FROM THIEME GEORG VERLAG

CA_MESSAGES_ORS_HDTV_IRD_GUIDELINE

JONATHAN JONA WISLER WHD.global

ONLINE LICENCE GENERATOR

UNIGATE CL Konfiguration mit WINGATE

Ereignisbearbeitung. Proactor Asychronous Completion Token Acceptor Connector. Software Design Patterns

SARA 1. Project Meeting

NTP Synchronisierung NTP Synchronizer

USB -> Seriell Adapterkabel Benutzerhandbuch

Cycling. and / or Trams

Exercise (Part II) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

Exercise (Part XI) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

Conformance Classes. Alter Standard Conformance Classes. Conformance Classes. Conformance Classes. Conformance Classes

Overview thermostat/ temperature controller

Java Tools JDK. IDEs. Downloads. Eclipse. IntelliJ. NetBeans. Java SE 8 Java SE 8 Documentation

The Single Point Entry Computer for the Dry End

SATA - IDE CONVERTER. Expansion

Context-adaptation based on Ontologies and Spreading Activation

Keysight Technologies Using InfiniiMax Probes with Test Equipment other than Infiniium Oscilloscopes

FEM Isoparametric Concept

Installationsanleitung / installation manual - DIMMbox

Transkript:

Vorlesung Rechnerarchitektur Seite 150 Bus Systems Interconnection Networks (Bus) A special case of a dynamic (switched) interconnection network is a bus. It consists of a bundle of signal lines used for transmission of information between different places in a computer system. Signals are typically bundled concerning their functionality: e.g.: - Address Bus, Data Bus, Synchronization Bus, Interrupt Bus... or by the hardware unit connected to the bus: - Processor Bus, Bus, I/O Bus; Peripheral Bus... VCC pull-up e.g. Processor Transceiver (bidirectional Port) Master EN0 EN1 TS-Driver EN2 Master/Slave n e.g. Address Bus Bus Signal Lines Slave Receiver Key means three state switched connection means fixed input from bus connection Three-state driver can be used for the dynamic switch. As the name three-state suggests, a TS-Driver has 3 output states: - drive high "1"; - drive low "0"; - no drive - high Z output. If all drivers of a bus signal line are disabled (high Z), the signal line is floating (should be avoided by a pull-up resistor). More about the technology can be learned in lecture: "Digitale Schaltungstechnik" At a time, only one master can be active on the bus. Only one three state (tri-state) driver is allowed to drive the bus signal lines. Enabling more than one driver may damage the system and must be avoided under all conditions. This is called bus contention. Though an access mechanism for a bus with multiple master is required, called arbitration.

Vorlesung Rechnerarchitektur Seite 151 Bus Arbitation Interconnection Networks (Bus) When a processor wants to access the bus, it sets its BREQ (bus request signal) and waits for the arbiter to grant access to the bus, signaled by BG (bus grant). The hardware unit (arbiter) samples all BREQx signals from all clients and then generates a single token which is signaled to one bus client. The ownership of the token defines a client to be bus master. This token permits the master to enable the TS-Driver and become the active master of the bus. At this time, all other units are slaves on the bus and can receive the information driven by the master. Because all slaves can take in the actual bus data by their receivers, a broadcast communication can be performed in every bus cycle (the most important advantage of a bus, beside its simplicity). Processor 0 Processor 1 ADDR_out BREQ0 BREQ1 ADDR_out snoop_addr_in snoop_addr_in Master 0 EN0 BG0 32 x EN1 BG1 Master 1 32 Address Bus Signal Lines Arbiter Slaves The arbiter gets the bus request signals from all masters and decides which master is granted access to the bus. Simultaneous requests will be served one after the other. A synchronous (clocked) arbiter can be realized by a simple finite state machine (FSM). default Idle no Grant BREQ0 & ~BREQ1 BREQ1 ~BREQ0 & ~BREQ1 default Grant0 Grant1 default BG0 ~BREQ0 & BREQ1 BG1 BREQ0 & ~BREQ1 Metastable behavior of the arbiter FFs can (and should!!) be avoided by deriving the request signals in a synchronous manner (using the same clock).

Vorlesung Rechnerarchitektur Seite 152 Basic Functions Bus Bussysteme können für die verschiedensten Aufgaben in einem Rechnersystem bestimmt sein: - Adressbus - Datenbus - I/O-Bus - -Bus - Prozessor-Bus - Interrupt-Bus - Synchronisationsbus - Fehleranalyse/-behandlung Ain + 5V Bin Beispiel: Synchronisationsbus Aout O.D. 500 Ω O.D. Bout Die Ausführung von Bussystemen ist abhängig von ihrer Anwendung, da sie normalerweise für ihre Anwendung optimiert werden. Wichtige Kenndaten eines Bussystems sind: - Wortbreite "Handshake"-Protokol - Datenübertragungsrate - Übertragungsverfahren + Protokoll - synchrone - asynchrone X DAV_ DACK_ - Hierarchiestufen - cachebus - operand --- - result --- - Processorbus L2 (bus) interface - - I/O Bus - Peripherie USB - LAN Kommunikationsbusse t pd

Vorlesung Rechnerarchitektur Seite 153 Protocol Bus - Framing - Command, Address, burst, length - Type - Transaction-based - Split-phase transactions - Packet-based - Flowcontrol - asynchron: handshake - synchron: valid/stop, wait/disconnect, credit-based - Data integrity and reliability - Detection, Correction, Hamming, parity - Cyclic Redundancy Check (CRC), re-transmission - Advanced Features - Embedded clock (8b/10b) - DC-free (8b/10b) - Virtual channels - Quality of service (QoS) X DAV_ DACK_ t pd Handshake protocoll

Vorlesung Rechnerarchitektur Seite 153 Bus Bus zur Verbindung von mehreren Platinen untereinander Backplane-Verdrahtung, global (VME, Futurebus+, XD-BUS) Verbindung von Bausteinen innerhalb einer Platine Peripheriebus, Prozessorbus, lokal (PCI-Bus, S-Bus, M-Bus) Verbindung von Systemen untereinander Workstationvernetzung (SCI-Interface, Ethernet...) Geschwindigkeit eines Bussystems wird durch mehrere Faktoren begrenzt: Laufzeit der Signale auf den Leitungen Eingangskapazität der Ports Verzögerungszeit der Ports Buszykluszeit Busclock Overhead (Protokoll) BTL +pd slot B.P.L. I/O Bussysteme Backplane B.S. - passiven Backplane - aktiven Backplane BTL VME TTL cmos 2 x 96 pins 2 x 52 I/O 1 Gbit/s Chip Interconnects Peripheral Chip Interconnect 2.1. PCI-Bus (33 MHz) no termination 66 MHz (132 MHz)? (128 MHz)? 1 stot + 1 chip I stubs

Vorlesung Rechnerarchitektur Seite 154 Bus Bussysteme können als Pipeline ausgeführt werden, um die Datentransferrate zu erhöhen (i860xp, Power PC620, XD-Bus,...) Eine mögliche Aufteilung in die Phasen: - Arbitrierung - Adressierung - Datentransport - Statusrückmeldung Da diese Phasen parallel auf verschiedenen Leitungsgruppen ausgeführt werden können, kann die Datentransferrate um den Faktor 4 größer werden.

Vorlesung Rechnerarchitektur Seite 155 PCI - Peripheral Component Interface Die wichtigsten Eigenschaften des Peripheral Component Interconnect (PCI) Bus: 32 Bit Daten und Adreßbus. Niedrige Kosten durch ASIC Implementierung. Transparente Erweiterung von 32 Bit Datenpfad (132 MB/s Spitzenwert) auf 64 Bit Datenpfad (264 MB/s Spitzenwert). Variable Burstlänge. Synchrone Busoperationen bis 33 MHz. Überlappende Arbitrierung durch einen zentralen Arbiter. Daten und Adreßbus im Multiplexverfahren zur Reduzierung der Anschlußpins. Erlaubt Selbstkonfiguration der PCI Komponenten durch vordefinierte Konfigurationsregister. Plug and Play fähig. Prozessorunabhängig. Unterstützt zukünftige Prozessorfamilien (durch Hostbridge oder direkte Implementierung). Unterstützt 64 Bit Adressierung. Spezifikation für 5V und 3.3V. Multimasterfähig. Erlaubt peer to peer Zugriffe von jedem beliebigen PCI Master zu jedem PCI Master/Slave. Hierarchischer Aufbau von mehreren PCI Bus Ebenen. Parity für Adressen und Daten. PCI Komponenten sind kompatibel zu vorhandener Treiber und Applikationssoftware Nach der im April 1993 vorgestellten Revision 2.0 der PCI Spezifikation ist bereits eine Erweiterung auf den Revisionsstand 2.1 in Arbeit, in der als wesentlichste Änderung die Erhöhung der maximalen Taktfrequenz von 33 MHz auf 66 MHz vorgesehen ist, was noch einmal eine weitere Verdopplung der maximalen Transferrate auf 528 MB/s bei 64 Bit Datenpfad bedeutet.

Vorlesung Rechnerarchitektur Seite 156 PCI - Peripheral Component Interface Notwendige Pins Optionale Pins Adressen und Daten Interface Kontrolle Fehler meldungen Arbitrierung (nur Master) System A/D[31:0] C/BE[3:0]# PAR FRAME# TRDY# IRDY# STOP# DEVSEL# IDSEL PERR# SERR# REQ# GNT# CLK RST# PCI DEVICE A/D[63:32] C/BE[7:4]# PAR64 REQ64# ACK64# LOCK INTA# INTB# INTC# INTD# SBO# SDONE TDI TDO TCK TMS TRST# 64 Bit Erweiterung Interface Kontrolle Interrupts Cache Unterstützung JTAG (IEEE 1149.1) CPU CPU CPU CPU HOST BUS PCI Arbiter Hostbridge LAN PCI BUS Graphic SCSI I/O Subsystem

Vorlesung Rechnerarchitektur Seite 157 PCI - Peripheral Component Interface Adressierung Der physikalische Adreßraum des PCI Busses besteht aus drei Adreßbereichen: - Address Space - I/O Address Space - Configuration Address Space A/D[31:0] = 00000000h FFFFFFFFh C/BE[3:0]# = 0110 0111 1100 1110 1111 Address Space 4GB A/D[31:0] = 00000000h FFFFFFFFh C/BE[3:0]# = 0010 0011 I/O Address Space 4GB A/D[31:0] = 00000000h FFFFFFFFh C/BE[3:0]# = 1010 1011 Configuration Address Space 4GB

Vorlesung Rechnerarchitektur Seite 158 PCI - Peripheral Component Interface Adreßabbildung Die Host to PCI Address Map wurde der Adreßraumaufteilung der PowerPC 60x Serie entnommen wie sie in [2] dargestellt ist, da zum Zeitpunkt des Entwurfes die Spezifikation des 40 Bit breiten Adreßraumes des PowerPC 620 noch nicht vorlag. Der vier Gigabyte große Adreßbereich des Hostsystems ist in drei Bereiche unterteilt. System, Adresse 00000000h 7FFFFFFFh (2 Gigabyte). PCI I/O Address Space, Adresse 80000000h BFFFFFFFh (1 Gigabyte) PCI Address Space, Adresse C0000000h FFFFFFFFh (1 Gigabyte) Die weitere Unterteilung der beiden, jeweils ein Gigabyte großen PCI Adreßräume ist in Abb. auf Seite 5 zu sehen. Der Hostadreßbereich ist auf der linken Seite dargestellt, die drei unabhängigen, jeweils vier Gigabyte großen PCI Adreßräume (, I/O und Configuration Address Space) auf der rechten Seite. Zu beachten ist, daß der Configuration Address Space nur theoretisch eine Größe von vier Gigabyte hat (durch den 32 Bit breiten Adreß und Datenbus). Das in der PCI Spezifikation vorgeschriebene Adreßlayout schränkt jedoch die Benutzung der Adreßleitungen, abhängig vom Typ des Konfigurationszyklusses, ein, so daß der effektiv nutzbare Adreßraum bei Konfigurationszyklen deutlich kleiner ist. Bei Konfigurationszyklen vom Typ 0 können maximal 32 Kilobyte adressiert werden; bei Konfigurationszyklen vom Typ 1 maximal 16 Megabyte. Typ 0 32 Devices * 8 Funktionen pro Device * 256 Bytes pro Funktion = 32 Kilobyte. Typ 1 256 Busse * 32 Devices pro Bus * 8 Funktionen pro Device * 256 Bytes pro Funktion = 16 Megabyte. Nähere Informationen zum Thema Konfiguration und Adreßlayout stehen im Kapitel 2.3 Konfigurationszyklus auf Seite 17. Aus der Sicht des PCI Busses ist der Adreßraum des Hostsystems nur über die Adressen im Address Space von 80000000h bis FFFFFFFFh (2 Gigabyte) zu erreichen. Diese werden auf die unteren 2 Gigabyte des Host Address Space abgebildet. PCI I/O Zyklen werden nicht auf das Hostsystem abgebildet. Die PCI to Host Address Map ist in

Vorlesung Rechnerarchitektur Seite 159 PCI - Peripheral Component Interface Adreßabbildung aus der Sicht des Hostsystems 00000000h 81000000h 8MB 8MB 80000000h ISA/PCI I/O 807FFFFFh 80800000h direct map PCI config space 80FFFFFFh PCI Config PCI I/O 00000000h ISA/PCI I/O 007FFFFFh 00800000h No System Adress 00FFFFFFh 01000000h System 2GB PCI I/O Space BF800000h 3F7FFFFFh 3F800000h 7FFFFFFFh 80000000h PCI I/O 1GB BF7FFFFFh C0000000h 8MB Reserved BFFFFFEFh C0000000h 16 Byte PCI/ISA INTACK PCI 00000000h No System Adress BFFFFFFFh C0000000h PCI Space FFFFFFFFh PCI 1GB FEFFFFFFh FF000000h 3EFFFFFFh 3F000000h FEFFFFFFh FEFFFFFFh 16MB FFFFFFFFh No System Adress FFFFFFFFh Processor Address Range Address Map PCI Address Range

Vorlesung Rechnerarchitektur Seite 160 Adreßabbildung aus PCI Sicht PCI - Peripheral Component Interface ISA/PCI 00000000h 00FFFFFFh 00000000h PCI 2GB System 2GB Reserved 7EFFFFFFh 80000000h 7FFFFFFFh 80000000h PCI I/O 1GB System Space 2GB BFFFFFFFh C0000000h PCI 1GB FFFFFFFFh PCI Space FFFFFFFFh Processor Address Range

Vorlesung Rechnerarchitektur Seite 161 PCI - Peripheral Component Interface Bus Commands Die Bus Commands zeigen dem Target die Art des Zugriffes an, die der Master anfordert und bestimmen den Adreßraum, in den die Adresse fällt. Sie werden während der Adreßphase auf den C/BE[3:0]# Leitungen codiert und gelten für die gesamte nachfolgende Transaktion. Die Codes der einzelnen Bus Commands stehen in Tabelle 1. Definition der Bus Commands C/BE[3:0]# Command Type 0000 Interrupt Acknowledge 0001 Special Cycle 0010 I/O Read 0011 I/O Write 0100 Reserved 0101 Reserved 0110 Read 0111 Write 1000 Reserved 1001 Reserved 1010 Configuration Read 1011 Configuration Write 1100 Write Multiple 1101 Dual Address Cycle 1110 Read Line 1111 Write and Invalidate

Vorlesung Rechnerarchitektur Seite 162 Buszyklen PCI - Peripheral Component Interface CLK 1 2 3 4 5 6 7 8 9 FRAME# IRDY# A/D wait C/BE# TRDY# DEVSEL# wait data1 data2 data3 data4 Adreß phase Datenphasen Buszyklus CLK 1 2 3 4 5 6 7 8 9 FRAME# IRDY# A/D wait C/BE# TRDY# DEVSEL# wait wait data1 data2 data3

Vorlesung Rechnerarchitektur Seite 163 PCI - Peripheral Component Interface Anders als bei Zugriffen im oder I/O Adreßraum, die durch die Adresse auf A/D[31:0] und den Bus Commands eindeutig adressiert werden, erfolgt die Adressierung des Devices beim Konfigurationszyklus durch ein weiteres Signal: IDSEL, das die Funktion eines Chip Select hat. Jedes Device hat sein eigenes IDSEL, das nur während der Adreßphase, wenn die Bus Commands C/BE[3:0]# ein Configuration Read oder Write signalisieren, abgetastet wird. Bei allen anderen Zugriffen ist der Pegel von IDSEL bedeutungslos. Adressiert werden die Konfigurationsregister Doppelwortweise. Ein Doppelwort entspricht 32 Bit. (Die Adreßbits A/D[1:0] werden dadurch nicht zur Adreßdecodierung benötigt.) Die Auswahl der Bytes innerhalb des adressierten Doppelwortes erfolgt mit Hilfe der Byte Enables C/BE[3:0]# CLK 1 2 3 4 5 FRAME# IRDY# A/D C/BE# 101x TRDY# DEVSEL# IDSEL Der PCI Bus unterscheidet zwei Typen von Konfigurationszyklen, die durch die Bitkombinationen in A/D[1:0] gekennzeichnet werden. Konfigurationszyklen vom Typ 0 (A/D[1:0] = 00 ) sind alle Zyklen, mit denen die Bridge diejenigen Devices ansprechen will, die sich auf dem Bus befinden, der der Bridge zugeordnet ist. Eine Bridge ist ein Device, das die Verbindung zwischen verschiedenen Busebenen (bzw. Bussystemen) herstellt. Typ 1 (A/D[1:0] = 01 ) gilt für Konfigurationszyklen die Devices betreffen, die in untergeordneten PCI Bushierarchien liegen.

Vorlesung Rechnerarchitektur Seite 164 PCI - Peripheral Component Interface Die Informationen, die in der Doppelwortadresse A/D[32:2] enthalten sind, sind abhängig vom Typ des Konfigurationszyklusses. 31 Reserved 1110 8 7 Function Number Register Number 2 1 0 0 0 Typ 0 31 Reserved 24 23 16 15 1110 8 7 2 1 0 Bus Device Function Register Number Number Number Number 0 1 Typ 1 Adreßformate von Konfigurationszyklen Die Bitkombinationen in A/D[31:11] sind beim Typ 0, und die Bitkombinationen in A/D[31:24] beim Typ 1 ohne Bedeutung. Bus Number Bestimmt die Busnummer des Busses, auf dem sich das zu konfigurierende Device befindet. PCI erlaubt durch eine hierarchisch gestaffelte Anordnung von verschiedenen PCI Bussen, bis zu 256 Busebenen herzustellen. Device Number Bestimmt eines der 32 möglichen Zieldevices auf jeder Busebene, für das der Konfigurationszyklus bestimmt ist. Function Number Adressiert eine der maximal 8 verschiedenen Funktionen eines Multifunktionsdevice. Register Number Doppelwortadresse des 64 Doppelworte umfassenden Konfigurationsregisters.

Vorlesung Rechnerarchitektur Seite 165 PCI - Peripheral Component Interface Host/System Bus Function Number Device Number Function Number CPU Host to PCI Bridge Register Number Register Number Bus Number PCI Bus (Number x) Device Number Function Number PCI to PCI Bridge Register Number Bus Number PCI Bus (Number y) Function Number Device Number Function Number Function Number Device Number Function Number Register Number Register Number Register Number Register Number Hierarchische PCI Busstruktur.

Vorlesung Rechnerarchitektur Seite 166 PCI - Peripheral Component Interface Device Number Decoder PCI Bridge PCI Bus IDSEL IDSEL IDSEL PCI Slot 1 PCI Slot 2 PCI Slot 3 Getrennte IDSEL Leitungen A/D[x] PCI Bus A/D[y] A/D[z] IDSEL IDSEL IDSEL PCI Slot 1 PCI Slot 2 PCI Slot 3 IDSEL aus A/D[13:11] 31 Reserved 24 23 16 15 1110 8 7 2 1 0 Bus Device Function Register Number Number Number Number 31 1 aus 21 1110 8 7 Function Number Register Number 2 1 0 0 0 Abbildung von IDSEL auf die oberen Adreßbits.

Vorlesung Rechnerarchitektur Seite 167 Konfiguration PCI - Peripheral Component Interface 00h 31 16 15 0 PCI Specification (Revision 2.0) defined Configuration Space Header Device ID Status Class Code Vendor ID Command Revision ID 00h 04h 08h BIST Header Type Latency Timer Cache Line Size 0Ch 10h 14h Base Address Registers 18h 1Ch Vendor defined Configuration Registers 20h 24h Reserved Reserved Expansion ROM Base Address Reserved Reserved 28h 2Ch 30h 34h 38h FFh Max_Lat Min_Gnt Interrupt Pin Interrupt Line 3Ch Configuration Space Header

Vorlesung Rechnerarchitektur Seite 168 PCI - Peripheral Component Interface Base Class 00h 01h 02h 03h 04h 05h 06h 07h FEh FFh Bedeutung Für Devices, die vor der Fertigstellung der Base Class Codes Definition gebaut wurden. Massen Speicher Controller Netzwerk Controller Display Controller Multimedia Controller Controller Bridge Device Reserved Für Devices, die in keine der oben genannten Basis Klassen eingeordnet werden können. Base Classes Base Class Sub Class Prog.If. Bedeutung 00h 00h Host bridge 01h 00h ISA bridge 02h 00h EISA bridge 06h 03h 00h MC bridge 04h 00h PCI to PCI bridge 05h 00h PCMCIA bridge 80h 00h Andere Bridge Devices Base Class 06h und deren Sub Classes

Vorlesung Rechnerarchitektur Seite 169 PCI - Peripheral Component Interface 31/63 3 2 1 0 0 Prefetchable Type space indicator 31 1 0 1 Reserved I/O space indicator Layout der Base Address Register Größe des Adreßraumes (hier 1MB) Base Address 11111111111100000000000000000000 32 32 Q Q J K Set on write FFFFFFFFh 32 Reset on read CLK Registermodell der Base Address Register Interrupt Pin read : In diesem Register steht, welchen Interrupt Pin das Device benutzt. Der Dezimalwert 1 bedeutet INTA#, 2 INTB#, 3 INTC# und 4 INTD#. Mit dem Dezimalwert 0 zeigt das Device an, daß es keine Interrupts benutzt. Interrupt Line r/w : Der Wert dieses Registers gibt an, mit welchem System Interrupt Pin das Device Interrupt Pin verbunden ist. Die Konfigurationssoftware kann mit Hilfe dieses Wertes zum Beispiel Prioritäten festlegen. Die Werte dieses Registers sind systemabhängig.

Vorlesung Rechnerarchitektur Seite 170 Arbitrierung PCI - Peripheral Component Interface CLK 1 2 3 4 5 6 7 REQ# 1 REQ# 2 GNT# 1 GNT# 2 FRAME# A/D address data address data Zugriff Master 1 Zugriff Master 2 Arbitrieung IRDY# FRAME# PCI Bus Master 0 Master 1 Master 2 REQ0_ gtimer GTE_ GT0_ PCI Arbiter GNT0_ REQ1_ GNT1_ REQ2_ GNT2_ PCI Arbiter

Vorlesung Rechnerarchitektur Seite 171 Modern Peripheral Interfaces What are the available Interfaces for peripheral devices? standard - proprietary PCI-X, PCIe, HT, (cht) System bus, Integrated Solution, Features PCI-X PCI-Express PCIe Hypertransport HT number of signal lines multiplexed operation data width usage operation mode clock frequency signal transmission data transmission termination burst transfers 64 + 39 = 101 yes Address/Data 32/64 bit I/O-Bus Peripheral-Extension fully synchronous, clocked 0-33/64 MHz 100/133 (266*) MHz CMOS-Level reflective wave signalling CMOS-Level clock synchronous no yes, many modes 4x burst, arbirary length 4,8,16,32,64 yes 2,4,8,16,32 bit I/O-Bus Peripheral-Extension source synchronous 8B/10B coded data 2,5 GHz CML-Level serial, differential coded embedded clock 100-110 Ohm yes, message transfers 26,36,57,105,199 yes, message orient. Link width 2,4,8,16,32 bit I/O-Bus** Peripheral-Extension source synchronous 1 x clock pro Byte 200-800 MHz (1-1.6GHz) U-Level 600mV NRZ, serial, differential DDR double data rate packetized 100 Ohm on chip, overdamped yes, comand + message transfers Split Transactions max. Bandwidth max.no of devices max length of signal lines Standard yes 533MB@66MHz-64bit 1GB@133MHz-64bit Bridge + 4,2,1 Devices 1 I/O-Device @133MHz aprox. 10cm Industrie (Intel) + IEEE yes 2x2,5Gbit/s@2bit 10GB@32bit point to point aprox. 3-10cm at FR4*** Industrie (Intel) + Konsortium Spec. page no. aprox. 220 aprox. 420 Web Infos www.pcisig.org www.intel.com yes 0,2GB@200MHz-2bit 12,8GB@1600MHz-32bit point to point, bidir. aprox. 3-10cm at FR4 Industrie (AMD) + Konsortium aprox. 330 www.hypertransport.org *) DDR double data rate transfer **) extended version for CPU Interconnect with Cache Coherency-Protocoll ***) PCB material

Vorlesung Rechnerarchitektur Seite 172 PCI-X Peripheral Component Interconnect Features: Available in many node computer. Servers use switched architectures. Synchronous interface controlled by a single ended clock. In the 133MHz mode, there is only one IO-device allowed on the "bus" (bridge-to-device). In the future, it will be replaced by PCIe because of reduced pin count and higher bandwidth. The PCI-bus cycle shows the overhead associated with a burst transfer without target wait states. (2 clk arbitration + 2 clk address/attribute + 2 clk target response and turn around + n* data phase of 8B, at n=4 => 6 to 4 at a data size of 32B. 133MHz 7,5ns n 1/2 is reached at 6 data transfer cycles with 8B each. n 1/2 = 48B. The peak bandwidth is 1GB/s. Real bandwidth is around 900 MB/s for long bursts.

Vorlesung Rechnerarchitektur Seite 173 PCI-Express (PCIe) Performance: Low-overhead, low-latency communications to maximize application payload bandwidth and link efficiency High-bandwidth per pin to minimize pin count per device and connector interface Scalable performance via aggregated Lanes and signaling frequency The fundamental PCI Express Link consists of two, low-voltage, differentially driven signal pairs: a transmit pair and a receive pair. Combining many lanes (single diff. pair.) together provides a high bandwidth, e.g. x16 is a bidir. link with 2.5 Gb/s, delivering a raw data rate of 2x 40Gb/s = 10GB/s PCIe will show a lower latency than PCI-X because typically it comes directly from the root complex (north bridge). Serializer latency can be neglected. [PCIeSpec] x16 A Switch is defined as a logical assembly of multiple virtual PCI-to-PCI bridge devices. Advanded Switching Interconnect (ASI) is based on the physical layer of PCIe and is aimed at the interconnect in the rack/cabinet.

Vorlesung Rechnerarchitektur Seite 174 (1) Hypertransport (HT) AMD s IO-HT is an open standard, cht only for large customer. HyperTransport is intended to support in-the-box connectivity (motherboard). The architecture of the HyperTransport I/O link can be mapped into five different layers [HT-WP]: The physical layer defines the physical and electrical characteristics of the protocol. This layer interfaces to the physical world and includes data, control, and clock lines. The data link layer includes the initialization and configuration sequence, periodic cyclic redundancy check (CRC), disconnect/reconnect sequence, information packets for flow control and error management, and doubleword (32bits) framing for other packets (data packet sizes from 4-64 Bytes). The protocol layer includes the commands, the virtual channels in which they run, and the ordering rules that govern their flow. The transaction layer uses the elements provided by the protocol layer to perform actions, such as reads and writes. The session layer includes rules for negotiating power management state changes, as well as interrupt and system management activities. Processor Processor AGP MemC North- Bridge NIC HT-IO System Area Network Bridge HT-IO Tunnel LAN downstream upstream DISK Cave super IO HT supports several methodes of data transfers between devices: PIO, DMA, peer-to-peer (this involves the bridge device for connecting the upstream with the downstream link). Interrupts are signalled by sending interrupt messages [HT-MS].

Vorlesung Rechnerarchitektur Seite 175 (2) Hypertransport A HT bus uses coupon-based flow control to avoid receiver overrun. Coupons (credits) flow back with NOP control packets (idle link signaling) of 4 Bytes. Control and data packets are distinguished by the CTL signal line. Control packets are separated into request, response and information packets. Read requests carry a 40 bit address and are executed as split transactions. A sized read request is a 8B packet which is responded with a read response packet of 4B and a data packet of 4-64B (Overhead of 12 to 64B, best case). Packet Information 4B Control packet Request 4,8B Response 4B Data packet 4-64B The physical layer uses a modified LVDS differential signaling. LVDS signal transmission and termination Control packet Data packet bidirectional HT bus Control packets may be inserted into data packets at any 4B boundary. Only one data packet is active at a time. The CTL signal distinguishes between control and data packets.

Vorlesung Rechnerarchitektur 1 Seite 176 I/O-devices: Basics I/O-devices An I/O-device is a resource of a computer system, which implements the function of a specific I/O-interface. An I/O-interface is used to establish communication between a computer system and the outside world (may be another computer system or a peripheral device like a printer). CPU Cache Data Addr System Bus I/O bridge Data Addr I/O Bus typically no cache coherency I/O device Device Command Register Device Status Register CR SR Device Data Register (read) DRR generic I/O device block diagram Device Data Register (write) DRW I/O Interface The minimal set of registers are: control register CR, status register SR, data register read DWR, data register write DRW. The CR is used to bring the device into a specific operating mode. The content of the register is very device specific. An enable bit for the activation of the output register and the input register is normally included. The status register signals the internal state of the device, e.g. if the transmitter register is empty (TX_EMPTY) or the receiver register is full (RX_FULL).

Vorlesung Rechnerarchitektur 1 Seite 177 I/O-devices The data registers are used to transfer data into and out of the device typically using programmed I/O (PIO). In order to access the device it must be placed in the address space of the system. memory mapped device I/O space for access Using special I/O-instructions for access to the device directly selects a predefined address space, not accessible by other instruction types. => restrictons in use. As the name suggests, a memory mapped device is placed in the memory address space of the processor. All instructions can be used to access the device (typically load/store). Special care must be taken if the processor can reorder load/store instructions. Caching of this address space should be turned off. Address Space I/O Space 0xFFFF_FFFF 0xF000_0000 I/O-Space 0xFFFF_FFFF Device A Device Registers free Space Device B Space 0x0000_0000 0xF000_0000 address space partitioning for memory mapped devices (example!)

Vorlesung Rechnerarchitektur 2 Seite 178 DMA Basics Definition : Direct Access (DMA) A direct memory access (DMA) is an operation in which data is copied (transported) from one resource to another resource in a computer system without the involvement of the CPU. The task of a DMA-controller (DMAC) is to execute the copy operation of data from one resource location to another. The copy of data can be performed from: - I/O-device to memory - memory to I/O-device - memory to memory - I/O-device to I/O-device A DMAC is an independent (from CPU) resource of a computer system added for the concurrent execution of DMA-operations. The first two operation modes are read from and write to transfers of an I/O-device to the main memory, which are the common operation of a DMA-controller. The other two operations are slightly more difficult to implement and most DMA-controllers do not implement device to device transfers. CPU DMA controller Addr Arbiter Data I/O device ACK REQ simplified logical structure of a system with DMA The DMAC replaces the CPU for the transfer task of data from the I/O-device to the main memory (or vice versa) which otherwise would have been executed by the CPU using the programmed input output (PIO) mode. PIO is realized by a small instruction sequence executed by the processor to copy data. The memcpy function supplied by the system is such a PIO operation. The DMAC is a master/slave resource on the system bus, because it must supply the addresses for the resources being involved in a DMA transfer. It requests the bus whenever a data value is available for transport, which is signaled from the device by the REQ signal. The functional unit DMAC may be integrated into other functional units in a computer system, e.g. the memory controller, the south bridge, or directly into an I/O-device.

Vorlesung Rechnerarchitektur 2 Seite 179 DMA Operations Direct Access (DMA) A lot of different operating modes exist for DMACs. The simplest one ist the single block transfer copying a block of data from a device to memory. For the more complex operations please refer to the literature [Mot81]. Here, only a short list of operating modes is given: - single block transfer - chained block transfers - linked block transfers - fly-by transfers All these operations normally access the block of data in a linear sequence. Nevertheless, there are more usefull access functions possible, as there are: constant stride, constant stride with offset, incremental stride,... CPU DMAC DMA Command Register Device Base Register 3. 1. 2a. 2b. Block Length Register Mem Base Register 5. Block Length Block Temporary Data Register Mem Base Addr Descriptor 1a. 1b. 4. I/O device 6. Command Area Device Data Register Execution of a DMA-operation (single block transfer) The CPU prepares the DMA-operation by the construction of a descriptor (1), containing all necessary information for the DMAC to independently perform the DMA-operation (offload engine for data transfer). It initalizes the operation by writing a command to a register in the DMAC (2a) or to a special assigned memory area (command area), where the DMAC can poll for the command and/or the descriptor (2b). Then the DMAC addresses the device data register (3) and read the data into a temporary data register (4). In another bus transfer cycle, it addresses the memory block (5) and writes the data from the temporary data register to the memory block (6).

Vorlesung Rechnerarchitektur 2 Seite 180 DMA Operations Direct Access (DMA) The DMAC increments the memory block address and continue with this loop until the block length is reached. The completion of the DMAoperation is signaled to the processor by sending an IRQ signal or by setting a memory semaphore variable, which can be tested by the CPU. multiple channels physical addressing, address translation snooping for cache coherency DMA control signals (REQ, ACK) are used to signal the availability of values in the I/Odevice for transportation. DMAC is using bus bandwidth which may slow down processor execution by bus conflicts (solution for high performance systems: use xbar as interconnect!) [Flik] Mikroprozessortechnik, CISC, RISC Systemaufbau Assembler und C, Flik,Thomas, Springer Verlag, 6.Aufl. 2001.

Vorlesung Rechnerarchitektur 2 Seite 181 Completion Signaling Completion Signaling of an operation For all communication function it is important to know, when an operation is completed. Signalling this event to the process being interested in this information is very difficult. The most common way is to throw an interrupt, which stops normal processing of a CPU and activates the interrupt handler. Beside the fact that interrupt processing has speed up significantly in the last years, it need to save the CPU state and in the newest processors the register file is larger than ever. Design decisions: IRQ polling at device reigster replication/mirroring in main memory notification queue thread scheduling Application Processor-Communication Processor model, active messages, NICs like Infiniband use the concept of a notification queue. For every communication instruction a corresponding entry in the completion notification queue is written, when the operation has finished. This can be testeted by the user process owning the queue. left intentionally blank ;-)

Vorlesung Rechnerarchitektur 2 Seite 182 Direct Access (DMA) left intentional blank