Skip to content

Latest commit

 

History

History
681 lines (530 loc) · 24.1 KB

File metadata and controls

681 lines (530 loc) · 24.1 KB

hyperq

NEXT tasks

  • [ ] hookup iq to rb via dot
  • [ ] haskell test for iq
    • [ ] wine solution in travis
    • [ ] iqfeed addition for travis (or download)
  • [ ] r -> haskell
  • [ ] contact http://tradingphysics.com/

technology stack

project design choices

There are several design features that drive hyperq and we hope offer a comparative advantage:

  • HFT on a budget: Development is focused on using the cheapest technologies where no free option exists.
  • literate programming documentation: markets are complex and so is large-scale project code. The heart of this project is experimentation with how to go about large-scale project formulation using a literate programming ethic.
  • fast AND smart: much HFT trading is about pure speed - being the first to react to an obvious mis-pricing, front-running a lazy or half-hidden market order are two obvious examples. We believe there is a gap in the market that can be exploited between pure-speed HFT and human-intervention algo trading that the project is all about.
  • meta-algo: the entire system and process is an algorithm to be searched, optimised and refactored.
  • open source: hyperq is open source purely and simply, licenced under the generous Apache system.
  • modern toolkit: the project is oriented towards using higher-level languages and concepts for rapid, robust development. Together with a literate programming style, this translates to using the right tool for the right job and making less compromises with the goals of the project.

The world of high frequency trading is a broad church of opinion, technology, ideas and motivations. hyperq is currently being developed using many different tools.

  • re-write with haskell pros and cons

The candidate solution to the design criteria is to use haskell for the system coding:

org-mode and literate programming

hyperq.org is the nerve center of active development and contains code snippets, research notes and design tools being used.

The project makes heavy use of babel to pick and mix between coding environments and languages, whilst still remaining literate:

The main idea is to regard a program as a communication to human beings rather than as a set of instructions to a computer. ~ Knuth

Similarly, a project such as hyperq is as much about communication between human beings as it is about maintenance of source code.

Brokers]]

Eventually, the hft schema will be broker independent but during the development phase IB is the test case. Interactive has the most mature API that works out of the box and a demo account so that hyperq can come pre-plumbed and (eventually) the project can also run out of the box.

Interactive Brokers consolidates tick data into 0.3 second time slices so it isn’t appropriate for low-latency work.

Just because it’s open-source doesn’t mean that it’s cost free. iqfeed has been chosen as an initial data feed to base project R&D efforts on. iqfeed costs dollars but the software can be downloaded for free and a demo version allows live data to flow with a lag.

A useful way to support the hyperq is to let DTN know if you decide to purshase iqfeed.

other system components and projects

ActiveQuant libcppa esper triceps cep-trader algo-trader tradelink ASM libtrading fix tradexoft

https://github.com/andykent/river I like the idea of pushing data through queries (rather than queries through data).

http://www.quickfixengine.org/quickfix/doc/html/about.html major broker API

runtime design

candidate svg dot file

digraph G {
        node [label="\N"];
        node [style=filled, color="#1f3950",fontcolor="#eeeeee",shape=box]; 
        subgraph cluster_market_data {
                graph [label="market data", color="#909090"];
                exchange [shape=egg,color="#ff111111",fontcolor="#101010",label="exchanges"];
                aggregator [shape=egg,color="#cc11cc22",fontcolor="#101010",label="data stream"];
                localport [label="local node"];
                exchange -> aggregator [dir=none];
                aggregator -> localport [dir=both];
        }
        subgraph cluster_offwire {
                graph [label="offwire",
                        color="#909090"];
                offwirealgo [label="offline algo"];
                observer;
                databases;
                observer -> databases [color=red,label="write",fontcolor=red];
        }
        subgraph cluster_onwire {
                graph [label="onwire",
                        color="#909090"];
                node [style=filled];
                disruptor [label="event server"];
                eventalgo [label="algo"];
                controller;
                controller -> eventalgo [color="#aaaaaa",dir=both]
                disruptor -> listener;
                disruptor -> eventalgo;
                disruptor -> controller;
                controller -> disruptor [color="#0080ff"];
        }
        subgraph cluster_broker {
                graph [label="broker data",
                        color="#909090"];
                broker [shape=egg,color="#ff111111",fontcolor="#101010",label="brokers"];
                brokeraggregator [shape=egg,color="#cc11cc22",fontcolor="#101010",label="aggregation"];
                broker -> brokeraggregator [dir=none];
                brokeraggregator -> trader [dir=both];
        }
        localport -> observer [color="#aaaaaa",style=dotted];
        controller -> localport [color="#aaaaaa"];
        localport -> disruptor [color="#0080ff"];
        listener -> observer [color="#aaaaaa",style=dotted];
        controller -> observer [color="#aaaaaa",style=dotted];
        controller -> trader [color="#aaaaaa",dir=both];
        controller -> offwirealgo [color="#aaaaaa",dir=both];
        databases -> offwirealgo [color=red,label="read",fontcolor=red];
        trader -> observer [color="#aaaaaa",style=dotted];
        eventalgo -> observer [color="#aaaaaa",style=dotted];
        offwirealgo -> observer [color="#aaaaaa",style=dotted];
}

img/candidate.svg

  • blue boxes represent individual components of the system
  • other colors represent external systems and data sources
  • each edge of the chart represents a messaging sytem requirement
  • there are two main one-way message passing routines that probably need to be very very fast (blue lines)
  • there is one read from database and one write to database (red lines)
  • every component registers to an observer component that records system state and dynamics (grey dotted).

The components have been grouped into several clusters:

  • market data: representing trade data, order book and news information flowing from outside the sytem to a local data node.
  • broker data: representing communication with trading mechanisms
  • onwire: components that are “in the event stream”. This is motivated by the specifications and documentation of the disruptor which argues that a single thread “wheel” is the best way to enable fast processing of market data into trading orders.
  • offwire: this represents algorithms and processing that are not on the single-thread process. The motivation here is to test the hypothesis in the disruptor argument.

There are several ideas that are being tested:

  • that the entire system should be the subject of search and optimisation, rather than componentry. One example of this is separation of complex event definitions from the statistical analysis once events are defined.
  • there is a focus on automation and machine learning. As such there is no place for human interaction. In particular, no visualization is required.
  • messaging between components can be the same general process. The components can also be tested in exactly the same way (such as speed and robustness testing)

haskell interaction

Via haskell, the dot chart can be the specifications for an actual system as well as a representation. And via svg technology, the picture can also be modified to be a reporting front-end in a production environment.

So, a new picture generates a new system with potentially new components (nodes) and messaging requirements (edges).

edges

import ControllerTest
g <- importDotFile "../candidate.dot"
edgeList g
exchangeaggregator
aggregatorlocalport
observerdatabases
controllereventalgo
disruptorlistener
disruptoreventalgo
disruptorcontroller
controllerdisruptor
brokerbrokeraggregator
brokeraggregatortrader
localportobserver
controllerlocalport
localportdisruptor
listenerobserver
controllerobserver
controllertrader
controlleroffwirealgo
databasesoffwirealgo
traderobserver
eventalgoobserver
offwirealgoobserver

nodes

import ControllerTest
import Data.List
g <- importDotFile "../dot/candidate.dot"
map (\x -> [x]) $ nodeList g
aggregator
broker
brokeraggregator
controller
databases
disruptor
eventalgo
exchange
listener
localport
observer
offwirealgo
trader

event feed

Event Feed Design

  • State “DONE” from “” [2013-04-30 Tue 10:48]

The hyperq_team is currently looking at development priorities and, whilst the votes aren’t all in, designing and building the event feed is a hot favorite. At the moment, the component is labeled as a market data stream but I’d like to broaden the definition.

Concentrating solely on market trades, bids and asks for an individual security in an effort to fully understand what’s happening with price is a dangerous activity. We think there’s a lot to be gained by always accounting for broad market conditions before you start to do statistical analytics on a particular security. Before you decide that goog is <insert forecast metaphor here>, you should check first as to whether the overall market is <insert forecast here> as well. If you look really, really closely, you might find that the market starts trending before goog does, and you have yourself an immediate edge.

This statistical arbitrage analogy can be extended beyond market data into market news flow. As a human, if I see a news item on Google, the very first next thing I do is look at recent Google price movements. If this could be automated (and it can) a news flow can trigger a recent order history lookup, a calculation of ‘price spikiness’, a switch to including the security in the event stream. With 1000 news flow items a day we’re going to find a handful of fat pitch opportunities.

Even more broadly, social media news flow is evolving rapidly and could be a great source of automated idea generation. A recent example, and fellow startup is http://www.ftsee.com/. Machine-readable news flow, json-based API, easy ticker lookups - it’s going to be a joy hooking this up after the pain involved with old and creaking market data feed APIs.

So I wonder what we’ll find when we crunch the correlations between intra-day returns and intra-day ftsee sentiment? If we do find something I’m not sure we’ll be telling unless you’re a contributor ;)

The concept of an event stream can also be extended towards events that happen inside the hyperq internals. If the internet goes down or (more likely) my son hacks into the dedicated line to steam the latest 30gig TF2 upgrade, then that’s an event much more important than any external data streaming in (or not streaming in). Stop calculating the finer nuances of market stochastics, please, and start prepping for panic trading on link resumption. Another example is the risk management component - knowing that near-term P&L is dependent on a few key positions should mean that event processing needs to be prioritized for these securities. And the best and fastest way to get at event processors may well be a message sent via the event stream.

So we’re thinking about an event stream very broadly and including concepts other than market data and even external data. I think we can think this way because of our commitment to a modern multi-toolkit over the pure speed java approach. You can do things in haskell that can’t be imagined in imperative languages, and you can create things with a multi-tool structure that can’t be created in any single language.

feed component development

digraph G {
        node [label="\N"];
        node [style=filled, color="#1f3950",fontcolor="#eeeeee",shape=box];
        
        subgraph cluster_feed {
                graph [label="feed control",
                        color="#909090"];
                node [style=filled];
                controller;
                admin;
                controller -> admin [color="#aaaaaa",dir=both]
        }
        subgraph cluster_iqport {
                graph [label="iqport",
                        color="#909090"];
                adminport [shape=egg,color="#cc11cc22",fontcolor="#101010"];
                marketport [shape=egg,color="#cc11cc22",fontcolor="#101010"];
                lookupport [shape=egg,color="#cc11cc22",fontcolor="#101010"]; 
        }
        
        admin -> adminport [color="#aaaaaa",dir=both];
        controller -> marketport [color="#aaaaaa", dir=both];
        controller -> lookupport [color="#aaaaaa", dir=both];
        controller -> stdin [color="#aaaaaa", dir=back]
        controller -> stdout [color="#aaaaaa"]
}

dot/iqfeed.png

Starting with the feed component graph above:

  • [ ] turn this into a list of ringbuffers and nodes
  • [ ] use these lists to construct the iqfeed control

dev notes

  • things that need to be (step 1)
    • a logon process
    • a delay if we are logging on
    • an admin listener and logger
    • an admin writer
    • an IO to start the process
    • an IO to stop the process
    • a level1 event feed listener and logger
    • a level1 event feed writer
    • a level2 event feed listener and logger
    • a level2 event feed writer
    • a history lookerupperer writer
    • a history lookerupperer listener and logger
    • code that decides what is written to each port
    • a place to log to + a stamp
    • a Controller (a big ? on this) that isn’t main
  • abstractions
    • these are all threads
    • portWriter
    • portReader (portListener)
    • portLogger (is this a specialisation of portReader)
  • state information
    • is iqconnect up and are we logged on?
      • interrogation of system threads?
      • is there anything spitting out of admin port?
    • which processes are running? Do we need to know this?
      • promises baked in to portWriter etc (?)
      • do processes need to register with a Controller?
  • things that need to be (step 2):
    • login
      • starts (restarts) iqconnect
      • is encapsulated entirely within the manager(?) In other words, it will only be invoked by the manager when admin port throws an error.
    • admin
      • an admin portReader that
      • decides if iqconnect is logged on (how? - via port error)
      • relogs if the connection is bung (up to a limit)
        • delays other threads until relogged
      • listens forever
      • decides when to stop
      • decides when to start
      • stops and starts other processes (?)
    • a main that
      • starts admin
      • starts ALL portReaders
      • starts the worker pool
      • listens for a stop instruction (IO)
      • stops everything
      • decides what to write to ports
      • listens on STDIN
    • a feed portReader that
      • stays connected to the feed port
      • sleeps if port error
      • logs all STDOUT
      • same for ALL ports
    • a feed portWriter that
      • sets the protocol
      • changes the ticker list
      • asks for news, fundamentals etc
      • does other stuff? - check iq feed API
    • a history portWriter that
      • requests information (ticker, time range)
    • an admin portWriter that
    • a lookup portWriter that
    • a worker pool (STM) for all threads that
      • provides a new thread
      • stop/starts threads (?)
      • monitors threads (?)
  • heirarchy (small to big)
    • logon
    • portReader
      • portLogger
    • portWriter
      • historyWriter
      • feedWriter
      • level2Writer
    • workerPool
    • manager (connection management)
    • main

iqfeed bits

Port comms

There are 4 main communication points to iqfeed:

Level1Port 5009 Streaming Level 1 Data and News Level2Port 9200 Streaming Market Depth and NASDAQ Level 2 Data LookupPort 9100 Historical Data, Symbol Lookup, News Lookup, and Chains Lookup information AdminPort 9300 Connection data and management.

More information can be obtained at DTN IQFeed Developer Area or https://www.iqfeed.net/dev/main.cfm (for a price).

Setup info

iqfeed is available for download via http://www.iqfeed.net/index.cfm?displayaction=support&section=download

Personally, my development environment is on a mac so I need to start and manage the process via wine.

From the command line:

For the demo product (delayed feed):

wine "Z:\\Users\\tonyday\\wine\\iqfeed\\iqconnect.exe" -product IQFEED_DEMO -version 1
nc localhost 5009

For a live account:

wine "Z:\\Users\\tonyday\\wine\\iqfeed\\iqconnect.exe" ‑product yourproductid ‑version 0.1 ‑login yourlogin ‑password yourpassword -autoconnect -savelogininfo

references

messaging

http://msgpack.org/

Protocol Buffers FAST ActiveQuant Google Protocol Buffer example

http://programmers.stackexchange.com/questions/121592/what-to-look-for-in-selecting-a-language-for-algorithmic-high-frequency-trading

http://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-zeromq-or

http://wiki.msgpack.org/pages/viewpage.action?pageId=1081387

http://kenai.com/downloads/javafx-sam/EventProcessinginAction.pdf

http://coffeeonesugar.wordpress.com/2009/07/21/getting-started-with-esper-in-5- minutes/

links to HN discussions

https://news.ycombinator.com/item?id=5550930

http://prof7bit.github.io/goxtool/

https://github.com/fmstephe/matching_engine

https://github.com/brotchie/ib-zmq

https://github.com/brotchie/r-zerotws

http://hackage.haskell.org/package/HLearn-algebra

http://bitcoin.clarkmoody.com/

disruptor

The disruptor scheme is at the center of many peoples thinking about low-latency and process design for HFT, and the benchmark for cutting-edge low-latency.

The key concepts (and this is a lame list to get started - please read the references) include:

  • the disruptor is a queue designed with the idea that queues are most often empty or full. The queue is a cyclical array (called a ring buffer) with a buffer size equal to a power of 2.
  • (often) having a single writer (called producer) to avoid cache line contention
  • multiple readers (consumers) that have dependency relationships with each other with these dependencies known to the queue.
  • checks (memory barriers) simply guarding queue over- and under- runs (producers getting too far ahead or consumers being too fast)
  • polling information (on producers and consumers) via the queue. Recent post og the groups suggest consumer polling might be better.
  • achieving lock-free concurrency via producer separation and consumer tracking (called the consumer cursor)
  • avoids (L2?) cache misses with variable padding
  • in recent times, the ‘worker pool’ is leading to large performance gains (I don’t know what it is exactly - just repeating the group discussions)

I’d be interested in feedback on porting these concepts to haskell:

  • I don’t see anything in hackage remotely similar, but I do see plenty of comparable concepts like cache-miss avoidance, STM worker pools etc etc.
  • this would provide quite a good reference for how much slower haskell is compared with imperative-style methods.
  • (for me), the translation of the concepts to haskell is difficult. So much of the disruptor design pattern is imperative (do this exactly given system architecture). Thinking in terms of promises and ‘what is’ may lead to a very different look and feel, and good insight as to the differences haskell brings to the problem domain.
  • I suspect that haskell could start to perform very well when there are multiple writers (producers) and when there are a large number of interdependent readers (consumers). I have zero proof of this - just a gut feel.
  • if there is a need to get closer to the metal than haskell can offer, it’s best to find this out now and understand why. In this case I imagine the disruptor as a language-independent messaging solution superior to Protocol Buffers.

A few references to help:

Active info/code hubs

Main project https://github.com/LMAX-Exchange/disruptor http://lmax-exchange.github.io/disruptor/

Collections https://github.com/LMAX-Exchange/LMAXCollections

Interesting OSS using disruptor http://storm-project.net/

Clojure Library and DSL http://userevents.github.io/phaser/

Blogs http://blogs.lmax.com/

Martin Thompson blog http://mechanical-sympathy.blogspot.co.uk/

Michael Barker blog http://bad-concurrency.blogspot.com.au

Less active/reference

https://github.com/fsaintjacques/disruptor– Performance in C++ is way down compared to main java repo.

Main technical Doc (2 years old now) https://github.com/LMAX-Exchange/disruptor/tree/master/docs

trisha’s posts

http://mechanitis.blogspot.com.au/2011/06/dissecting-disruptor-whats-so-special.html

Original Martin Fowler piece Martin Fowler

http://www.aurorasolutions.org/over-6-million-transactions-per-second-in-a-real-time-system-an-out-of-the-box-approach/