High frequency computing is a class of computer programming applications that relate to the processing of high-volume data streams, usually in real-time or near real-time. In general, its purpose is to enable high-speed decision making in a rapidly changing environment. Examples of high frequency computing include automated trading algorithms, high-speed network monitoring and management applications, and dynamic control systems such as fly-by-wire avionics.
This class combines elements of real-time computing, event stream processing and high performance computing, but is distinct since it assumes that a high-volume data stream is being analyzed in real-time.
Characteristics
High frequency computing usually involves real-time processing on data streams with incoming data rates of 1,000 to 1,000,000 updates per second or higher. At these rates, the number of CPU instruction cycles that are available between arriving updates typically dominate the design decisions. Due to these demands, high frequency computing applications generally share a number of common characteristics. The goal is to optimize the number of CPU cycles spent on tasks related to processing of the next arriving update.
Multi-processor systems
Because of their proximity, a machine with 2 or more CPUs can run more efficiently than an equivalent number of single CPU machines. High frequency computing can take advantage of this efficiency, particularly since one or more of the CPUs can be assigned to handle the data update process, leaving the remaining CPUs to handle the processing related to other tasks such as I/O, analytics, etc. Most high frequency computing application are typically implemented on machines with 2 or more CPUs.
Multiple threads of execution
Modern operating systems can handle multiple execution threads with a high degree of efficiency - particularly when more than one CPU is available on a machine. In fact, an application should have at least one active thread per CPU in order to gain maximum efficiency from the available CPU cycles. In practice, isolating the high frequency data updates from other computing tasks such as analytics via a separate thread can greatly simplify the programming task and can help ensure that at least one or more CPUs are dedicated to serving the incoming data updates.
Limited I/O
In keeping with the goal of optimizing the number CPU instructions between data updates, tasks that cause the CPU to wait, such as I/O to storage devices, network interfaces, etc., are avoided to the extent possible. Where I/O cannot be avoided, care is taken to buffer device I/O, and take other steps such as using multiple threads to innsulate the data update process from any unnecessary waiting. In practice, this usually means data is kept in memory rather than written out to disk-based storage or transactional databases such as Oracle or SQL Server. Because transactional databases are often unable to handle more than a few hundred updates per second without significant tuning and/or advanced hardware, high frequency computing applications often rely on non-transactional storage techniques such as file-based storage.
Buffering
Because of the bursty nature of many sources of real-time data, high frequency computing often uses dynamic buffering techniques, such as expandable circular buffers, to cache data between different processing steps. For instance, one thread may be responsible for updating an internal last-value array, i.e. caching the latest update across an array of values. This thread may be fed by data arriving on a network interface. Typically, the network interface process will cache the incoming data into a storage buffer, from which the data update process will draw the data down. The storage buffer reduces the chance that data will be lost because the data update process is busy when data arrive on the network interface. Careful use and monitoring of buffering is crucial in most high frequency computing applications.
Highly efficient locking
The use of multiple, independent execution threads requires synchronization techniques to avoid problems that can arise when different threads share the same memory space. In general, synchronization techniques can be either fast or slow. Synchronization between different processes is usually much slower than synchronization between different threads running in the same process. Therefore, high frequency computing applications tend to run as a single process, with multiple threads using efficient locking.
High speed analytics
Handling the data updates is one part of the problem. Deciding whether to take action based on the data, and doing so in a timely manner is another. The logic that handles these decisions are called analytics. Typically, the analytics in high-frequency computing must also be very fast. Inefficient analytics can bog down the CPU and eventually cause the data update process to slow down or overflow its buffers. For example, a real-time trading application may scan incoming data, and apply a complicated financial model to determine whether there is a profitable trade available. If the analytics are too slow or complicated, the opportunity may disappear before a decision can be made. Analytics can be optimized via employing intelligent approximations, using separate execution threads and deploying the application on multiple CPU servers.
Fixed-length records
Processing streams of fixed-length records, in general, can be much faster than streams composed of variable length records. This is because variable-length records usually require more complex logic to deserialize the incoming data and reconstruct the complex structures described by the data. In contrast, processing a fixed-length record can be as simple as copying a byte array. For this reason, high frequency computing applications tend to use fixed-length records whenever possible.
Mutable strings
One common performance bottleneck stems from the manipulation of immutable strings. This is because any operation that attempts to modify an immutable string generally requires a memory allocation step. When used inappropriately, immutable strings can seriously degrade system performance. Examples of mutable string structures are the StringBuilder class in , and the StringBuffer and StringBuilder classes in Java.
Examples Event Stream Processing Algorithmic Trading
|
|
|