4BA2

leaflet

Technology Survey

Scalable Coherent Interface (SCI)

by Oliver Pugh, Jeff Cowhig, Jonathan Crowe and Eoin Blacklock

Background:

The Scalable Coherent Interface (SCI) is an ANSI/IEEE standard that defines a high performance interconnect technology, providing solutions for a wide range of applications.

During the late 80's there was an effort made by a group of people to define a successor for the Futurebus. This successor was to be named Futurebus+, and would support a significant degree of multiprocessing. It became apparent however, that this could not be achieved using traditional bus technology, as microprocessors were soon going to become too fast for any bus. The reason for this is that a bus is a centralised resource (an inherent serial bottleneck) and signalling was then already approaching its physical limits (the speed of light). Both of these issues drastically limited scalability, so the group developed innovative distributed solutions to get around these problems, while preserving the services provided by bus-based interconnects. The resulting specification was named SCI and was approved in 1992.

The SCI interconnect, the memory system and the associated protocols are fully distributed and scalable. The main objective of SCI was to deliver high communication performance to parallel or distributed applications. SCI was designed to connect a large number of nodes (up to 64K). A node can be made up of a workstation or server machine, a processor, a memory module, I/O controllers and devices, or bridges to other buses/interconnects. Each node attaches to the network using a standard interface. The basic transfer unit is a packet which eliminates the overhead of bus-cycles.

SCI allows data transfer at nearly 500 MHz achieving a one gigabyte per second transfer rate. Adding nodes to an SCI network also adds bandwidth, so performance scales well.

SCI Topologies:

SCI nodes are interconnected through unidirectional point-to-point links in a ring/ringlet topology. This approach can only be applied up to a certain limit of nodes before the network becomes saturated (as each node receives traffic generated from all other nodes in the ring). This limit is usually 8-10 machines, but it really depends on the load generated by the nodes. Therefore not many applications will perform well with large rings, except for some I/O applications. Certain "housekeeping" tasks such as maintaining certain timers, discarding damaged packets so they don't circulate the ring indefinitely and circulating ring maintenance information are assigned to one node within the ring ("the scrubber"). Switches are used to connect multiple independent SCI ringlets. In systems today, there are two commonly used topologies used to implement this.

The first topology is shown in figure 1. There are 4 ports on the switch (with 2 extra extension ports - not shown). Using the extension ports, the switch can either be expanded to a stacked switch with possibly 16 ports or configured to a non-expandable 6-port switch.

The second topology is shown in figure 2. This multidimensional tori uses small SCI ringlets at each dimension. Each node is connected to each dimension and uses a small switch integrated into an SCI adapter to provide cross-dimensional packet transmissions. It is possible to create a 3-dimensional tori with up to 10-12 nodes in each dimension.

Figure 1 Figure 2
Figure 1
Figure 2

SCI Node Model:

Figure 3, shown below, shows the layout of an SCI node.

The SCI node needs to be able to transmit packets while concurrently accepting packets from other nodes. To implement this, FIFOs are used to hold symbols received while a packet is being sent. Node application logic is not expected to match the SCI link speeds; therefore input and output FIFOs are needed. So in order to match the higher link transfer rate, nodes need to ensure that all symbols within one packet are available for transmission at full link speed.

In general, the SCI node maintains two queues, which serve as buffers until transmission bandwidth becomes available for outbound packets or until inbound packets can be processed by the nodes application logic.

SCI node model

Figure 3

SCI Transaction:

Transactions are split into a request and a response sub-action. Packets carry addresses, command and status information, and data (depending on the transaction). Up to 64 transactions can be outstanding per node. Each sub-action consists of a send packet (generated by the sender) and an echo (acknowledgement) packet returned by the receiver. The echo tells the sender whether the packet was accepted (stored in the receivers input queue) or rejected (due to a full input queue). In the former case, the sender can discard the send packet form its output queue. In the latter case, the sender re-transmits the packet. The illustration below highlights this handshaking. It involves 3 nodes, the sender, the receiver and an intermediate agent.

 

There are 3 types of transactions in SCI.

  1. Transactions with responses (read, write and lock transactions).
  2. Move transactions (for example non coherent writes). These do not have response sub-actions. They are therefore more efficient than writes.
  3. Event transactions. These do not have responses and do not generate an echo. They can be used to distribute a time stamp for global time synchronisation within the SCI system.

Shared Memory In SCI:

As said previously, SCI was a solution to the inherent serial bottleneck that the bus leads to, but the standard maintains bus-like services. Just like a bus, SCI has the ability to support remote memory accesses for both read and writes. SCI uses a 64-bit fixed addressing scheme i.e. a physically addressed, distributed shared memory system. The distribution of the memory is logically shared, just like a system with a centralised bus and shared memory. The upper 16 bits specify the node on which the addressed physical storage is located and the lower 48 bits specify the local physical address within the memory of the node being addressed. Nodes can access this global physical address space and hence any physical memory location within the whole network by mapping parts or segments of this memory space into their own memory. Figure 4 illustrates this concept.

 

Figure 4
Figure 4

Cache Coherency In SCI:

Most high performance processors use local caches to reduce effective memory-access times. Cache-coherence protocols define mechanisms that guarantee consistent data. In SCI, the cache coherence protocols are provided as options only. Therefore a compliant SCI implementation need not cover cache coherence. In bus based networks snooping protocols can be employed. Processors broadcast every transaction, which allows eavesdropping and intervention techniques to be used to ensure data consistency. Broadcasting cannot be used in SCI as nodes can only communicate with the next node in the list (unidirectional). SCI uses a distributed-directory based cache-coherence protocol. Each shared line of memory is associated with a distributed list of nodes sharing that line. All nodes with cached copies participate in the update of the list. Each coherent memory line has a pointer to the node at the head of the shared list. Each nodes cache line tag includes pointers to the next and previous entries in the list for that cache line. The resulting structure is shown in figure 5 below.

Figure 5

Figure 5

Where is SCI used today and why is it used?

SCI has one main advantage over its competitors, namely that not only is it a System Area Network, it also allows remote memory accesses. Thus, SCI is suitable for both message passing and shared memory programming on clusters.

Dolphin Interconnect Solutions is the leading manufacturer of SCI system components today. The majority of all projects using SCI and its concepts are based upon Dolphin technologies. The modern applications of SCI cover a broad range of areas, spanning from military to commercial, and even as far as health informatics.

All SCI systems are based around the LC (Link Controller) chip; much like a desktop PC is based around an Intel Pentium or equivalent. The LC is currently in its fifth generation and has the capability to reach speeds of up to 800MBps, which is approaching the original design goal of 1GBps. Based on this chip, Dolphin offer a range of SCI bridges to other system busses. E.g. Sun's SBus, and the PCI bus.

More and more SCI cluster based software environments are being used now than ever, and as a result SCI adapter cards are used in many different real-life applications. Here are some examples of these practical applications:

1. Camber LTD

Camber provides a range of products and services aimed at both the military and commercial markets. These include image processing, modelling and algorithm development solutions, including flight simulator imagery with full-colour, texture, terrain, objects, lighting, meteorological and atmospheric effects plus geographic information support. The camber website details a new product called Battle Vision (as shown below). This product exploits SCI's high performance communication to provide real time flight simulation, landscape texturing etc. Camber maintain that it has been recognized by the American Congress that the development of this product alone has saved the American taxpayer somewhere in the region of $70 million. Saved!

Camber LTD

2. Fujitsu Siemens

Fujitsu Siemens have developed a product called hpcLine, which achieves excellent performance and scalability in communication intensive applications using the Scalable Coherent Interface. Quoting the Fujitsu Siemens Website: "Fujitsu Siemens integrates high-speed interconnects. The customers can choose between Myrinet2000, and SCI (Scalable Coherent Interface).

Fujitsu Siemens

Fujitsu are fundamentally taking this easily adaptable communications protocol and incorporating it into complex business solution products. Quoting the website: "Thanks to the use of standardised components, Fujitsu Siemens Computers have introduced customer solutions for high performance computing, that are characterised by a superior price performance."

3. Philips Medical

Philips Medical is using SCI in providing "state-of-the-art visualization in one of the most advanced ultrasound systems available". A visit to the Philips website will take your breath away. The practical application of network power in creating the new iu22 ultrasound machine has helped make huge progress in this field. Quoting the website: "Our xSTREAM architecture offers up to 57,000 dynamically scalable digital channels, which allow very precise beam control, focusing and image formation."

Philips Medical

Thanks to the power of SCI communications, Mothers can now visibly see and interact with their unborn child. Even the most stubborn technophobe would acknowledge that this is a massive technological breakthrough. The website goes on to explain 3d, and the new concept of 4d. "3D, 4D and multiplanar imaging has achieved new levels of clarity, accuracy and precision. The iU22 architecture's unprecedented volume acquisition and processing capability allows true real-time 4D foetal heart imaging." This is truly amazing technological progress that can only create more joy in the world.

4. NLX Corporation

NLX Corporation is a provider of simulation and training systems. Quoting the website: "Military and commercial crews can experience the sight, feel, sound, and motion that is a result of our sensory cueing systems including digital control loading, multi-channel image generators, wide field-of-view visual display systems, high-fidelity sound generation and high payload 6 degree-of-freedom motion systems." This is yet another example of raw processing power coupled with SCI's ability to access huge quantities of data providing the user with an end product that would baffle the previous generation.

NLX Corporation

These practical real-world applications of SCI technology highlight the broad scope for its use. Whether it is in industry, military or bio-informatics there is no doubt that SCI is being used at the fore-front of technological development. As always the US military is at the forefront of technological progress. Whether this is a good thing is now irrelevant, it has been the driving force behind most advances in computing over the last forty years.

Projects of further Interest

Very few Projects actually fully implement the SCI standard. Most focus on the message passing or cache-coherency problems individually so as to improve portability and reduce hardware dependencies. One project that has successfully implemented both flavours is the SMiLE project carried out at the Universitat Munchen.  The adapter card they developed has been used as the basis for the development of a hardware monitor. This was previously thought to be impossible or at best impractical because of the increased CPU usage that a massive application like this would use. But now with the added networking power provided by the SCI technology, the monitor facilitates the observation of network traffic with virtually no intrusion overhead. It also creates memory access histograms over the entire SCI physical address space. This vast amount of energy can then be used to optimise both software applications and hardware components.

As an aside, our very own Trinity College Dublin is taking a different approach to this problem. The approach enables the collection of huge stack traces in large on-board memories, these are then used in an offline process to analyse the system's performance. For this purpose, the traces are transferred into a large database system enabling complex queries across the huge traces.

Conclusion

Our group chose the topic of SCI because we felt it remains an exciting cutting-edge communications protocol that is on the cutting-edge of technological advancement. In this essay we have attempted to provide the reader with a concise accessible and thorough insight into SCI. We have detailed how SCI came into existence and why? We have highlighted the main functionality of this certified protocol. Furthermore, we have provided clear concise images and diagrams to make the concept more accessible to people of a non-technical background. It has also been shown what reasons exist, financial or otherwise, for adopting an approach that incorporates SCI, and finally this essay has detailed some of the many practical applications of using SCI in the world today, whether in the commercial realm, military endeavours, or bio-informatical development.

References

  1. IEEE Standard for Scalable Coherent Interface (SCI)
    The institute of Electrical, Electronics Engineers, Inc.
    IEEE Std 1596-1992
  1. SCI : Scalable Coherent Interface: Architecture and Software For High Performance Compute Clusters/ Hermann Hellwagner ; Alexander Reinfels (Eds.)
    (Lecture Notes In Computer Science ; Vol. 1734)
  1. Shared Memory Programming on NUMA-based Clusters using a General and Open Hybrid Hardware/Software Approach/ Martin Schulz
    (Research Report Series; Vol. 24).
  1. SCIzzl:
    http://www.scizzl.com/
  1. Dolphin Interconnect Solutions
    http://www.dolphinics.com
  1. Camber
    http://www.camber.com/sandp.asp?n=modsim
  1. Philips Medical Systems
    http://www.medical.philips.com/main/products/ultrasound/general/iu22/features/architecture.html
  1. Futurebus+
    http://granite.sru.edu/~stringer/fb.html
  1. Fujitsu-Siemens
    http://www.fujitsu-siemens.com/hpc/products/hpcline/overview.htm
    http://www.fujitsu-siemens.com/hpc/products/hpcline/interconnect.htm