- [ ] hookup iq to rb via dot
- [ ] haskell test for iq
- [ ] wine solution in travis
- [ ] iqfeed addition for travis (or download)
- [ ] r -> haskell
- [ ] contact http://tradingphysics.com/
There are several design features that drive hyperq and we hope offer a comparative advantage:
- HFT on a budget: Development is focused on using the cheapest technologies where no free option exists.
- literate programming documentation: markets are complex and so is large-scale project code. The heart of this project is experimentation with how to go about large-scale project formulation using a literate programming ethic.
- fast AND smart: much HFT trading is about pure speed - being the first to react to an obvious mis-pricing, front-running a lazy or half-hidden market order are two obvious examples. We believe there is a gap in the market that can be exploited between pure-speed HFT and human-intervention algo trading that the project is all about.
- meta-algo: the entire system and process is an algorithm to be searched, optimised and refactored.
- open source: hyperq is open source purely and simply, licenced under the generous Apache system.
- modern toolkit: the project is oriented towards using higher-level languages and concepts for rapid, robust development. Together with a literate programming style, this translates to using the right tool for the right job and making less compromises with the goals of the project.
The world of high frequency trading is a broad church of opinion, technology, ideas and motivations. hyperq is currently being developed using many different tools.
- re-write with haskell pros and cons
The candidate solution to the design criteria is to use haskell for the system coding:
- haskell has already been used for this purpose : http://www.starling-software.com/misc/icfp-2009-cjs.pdf
- concurrency is handled quite gracefully: http://www.haskell.org/haskellwiki/Concurrency_demos http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/beautiful.pdf
- speed is an issue but haskell can often get to within a factor of two versus C code. Haskell also plays nicely with other languages so there is room for hand-crafting critical sections in C.
- other solutions had issues. Erlang is too wierd and slow, node.js leads to callback hell; C, C++ and Java lead to a much larger code base.
org-mode and literate programming
hyperq.org is the nerve center of active development and contains code snippets, research notes and design tools being used.
The project makes heavy use of babel to pick and mix between coding environments and languages, whilst still remaining literate:
The main idea is to regard a program as a communication to human beings rather than as a set of instructions to a computer. ~ Knuth
Similarly, a project such as hyperq is as much about communication between human beings as it is about maintenance of source code.
Brokers]]
Eventually, the hft schema will be broker independent but during the development phase IB is the test case. Interactive has the most mature API that works out of the box and a demo account so that hyperq can come pre-plumbed and (eventually) the project can also run out of the box.
Interactive Brokers consolidates tick data into 0.3 second time slices so it isn’t appropriate for low-latency work.
Just because it’s open-source doesn’t mean that it’s cost free. iqfeed has been chosen as an initial data feed to base project R&D efforts on. iqfeed costs dollars but the software can be downloaded for free and a demo version allows live data to flow with a lag.
A useful way to support the hyperq is to let DTN know if you decide to purshase iqfeed.
ActiveQuant libcppa esper triceps cep-trader algo-trader tradelink ASM libtrading fix tradexoft
https://github.com/andykent/river I like the idea of pushing data through queries (rather than queries through data).
http://www.quickfixengine.org/quickfix/doc/html/about.html major broker API
digraph G {
node [label="\N"];
node [style=filled, color="#1f3950",fontcolor="#eeeeee",shape=box];
subgraph cluster_market_data {
graph [label="market data", color="#909090"];
exchange [shape=egg,color="#ff111111",fontcolor="#101010",label="exchanges"];
aggregator [shape=egg,color="#cc11cc22",fontcolor="#101010",label="data stream"];
localport [label="local node"];
exchange -> aggregator [dir=none];
aggregator -> localport [dir=both];
}
subgraph cluster_offwire {
graph [label="offwire",
color="#909090"];
offwirealgo [label="offline algo"];
observer;
databases;
observer -> databases [color=red,label="write",fontcolor=red];
}
subgraph cluster_onwire {
graph [label="onwire",
color="#909090"];
node [style=filled];
disruptor [label="event server"];
eventalgo [label="algo"];
controller;
controller -> eventalgo [color="#aaaaaa",dir=both]
disruptor -> listener;
disruptor -> eventalgo;
disruptor -> controller;
controller -> disruptor [color="#0080ff"];
}
subgraph cluster_broker {
graph [label="broker data",
color="#909090"];
broker [shape=egg,color="#ff111111",fontcolor="#101010",label="brokers"];
brokeraggregator [shape=egg,color="#cc11cc22",fontcolor="#101010",label="aggregation"];
broker -> brokeraggregator [dir=none];
brokeraggregator -> trader [dir=both];
}
localport -> observer [color="#aaaaaa",style=dotted];
controller -> localport [color="#aaaaaa"];
localport -> disruptor [color="#0080ff"];
listener -> observer [color="#aaaaaa",style=dotted];
controller -> observer [color="#aaaaaa",style=dotted];
controller -> trader [color="#aaaaaa",dir=both];
controller -> offwirealgo [color="#aaaaaa",dir=both];
databases -> offwirealgo [color=red,label="read",fontcolor=red];
trader -> observer [color="#aaaaaa",style=dotted];
eventalgo -> observer [color="#aaaaaa",style=dotted];
offwirealgo -> observer [color="#aaaaaa",style=dotted];
}- blue boxes represent individual components of the system
- other colors represent external systems and data sources
- each edge of the chart represents a messaging sytem requirement
- there are two main one-way message passing routines that probably need to be very very fast (blue lines)
- there is one read from database and one write to database (red lines)
- every component registers to an observer component that records system state and dynamics (grey dotted).
The components have been grouped into several clusters:
- market data: representing trade data, order book and news information flowing from outside the sytem to a local data node.
- broker data: representing communication with trading mechanisms
- onwire: components that are “in the event stream”. This is motivated by the specifications and documentation of the disruptor which argues that a single thread “wheel” is the best way to enable fast processing of market data into trading orders.
- offwire: this represents algorithms and processing that are not on the single-thread process. The motivation here is to test the hypothesis in the disruptor argument.
There are several ideas that are being tested:
- that the entire system should be the subject of search and optimisation, rather than componentry. One example of this is separation of complex event definitions from the statistical analysis once events are defined.
- there is a focus on automation and machine learning. As such there is no place for human interaction. In particular, no visualization is required.
- messaging between components can be the same general process. The components can also be tested in exactly the same way (such as speed and robustness testing)
Via haskell, the dot chart can be the specifications for an actual system as well as a representation. And via svg technology, the picture can also be modified to be a reporting front-end in a production environment.
So, a new picture generates a new system with potentially new components (nodes) and messaging requirements (edges).
import ControllerTest
g <- importDotFile "../candidate.dot"
edgeList g| exchange | aggregator |
| aggregator | localport |
| observer | databases |
| controller | eventalgo |
| disruptor | listener |
| disruptor | eventalgo |
| disruptor | controller |
| controller | disruptor |
| broker | brokeraggregator |
| brokeraggregator | trader |
| localport | observer |
| controller | localport |
| localport | disruptor |
| listener | observer |
| controller | observer |
| controller | trader |
| controller | offwirealgo |
| databases | offwirealgo |
| trader | observer |
| eventalgo | observer |
| offwirealgo | observer |
import ControllerTest
import Data.List
g <- importDotFile "../dot/candidate.dot"
map (\x -> [x]) $ nodeList g| aggregator |
| broker |
| brokeraggregator |
| controller |
| databases |
| disruptor |
| eventalgo |
| exchange |
| listener |
| localport |
| observer |
| offwirealgo |
| trader |
- State “DONE” from “” [2013-04-30 Tue 10:48]
The hyperq_team is currently looking at development priorities and, whilst the votes aren’t all in, designing and building the event feed is a hot favorite. At the moment, the component is labeled as a market data stream but I’d like to broaden the definition.
Concentrating solely on market trades, bids and asks for an individual
security in an effort to fully understand what’s happening with price is a
dangerous activity. We think there’s a lot to be gained by always accounting
for broad market conditions before you start to do statistical analytics on
a particular security. Before you decide that goog is <insert forecast
metaphor here>, you should check first as to whether the overall market is
<insert forecast here> as well. If you look really, really closely, you
might find that the market starts trending before goog does, and you
have yourself an immediate edge.
This statistical arbitrage analogy can be extended beyond market data into market news flow. As a human, if I see a news item on Google, the very first next thing I do is look at recent Google price movements. If this could be automated (and it can) a news flow can trigger a recent order history lookup, a calculation of ‘price spikiness’, a switch to including the security in the event stream. With 1000 news flow items a day we’re going to find a handful of fat pitch opportunities.
Even more broadly, social media news flow is evolving rapidly and could be a great source of automated idea generation. A recent example, and fellow startup is http://www.ftsee.com/. Machine-readable news flow, json-based API, easy ticker lookups - it’s going to be a joy hooking this up after the pain involved with old and creaking market data feed APIs.
So I wonder what we’ll find when we crunch the correlations between intra-day returns and intra-day ftsee sentiment? If we do find something I’m not sure we’ll be telling unless you’re a contributor ;)
The concept of an event stream can also be extended towards events that happen inside the hyperq internals. If the internet goes down or (more likely) my son hacks into the dedicated line to steam the latest 30gig TF2 upgrade, then that’s an event much more important than any external data streaming in (or not streaming in). Stop calculating the finer nuances of market stochastics, please, and start prepping for panic trading on link resumption. Another example is the risk management component - knowing that near-term P&L is dependent on a few key positions should mean that event processing needs to be prioritized for these securities. And the best and fastest way to get at event processors may well be a message sent via the event stream.
So we’re thinking about an event stream very broadly and including concepts other than market data and even external data. I think we can think this way because of our commitment to a modern multi-toolkit over the pure speed java approach. You can do things in haskell that can’t be imagined in imperative languages, and you can create things with a multi-tool structure that can’t be created in any single language.
digraph G {
node [label="\N"];
node [style=filled, color="#1f3950",fontcolor="#eeeeee",shape=box];
subgraph cluster_feed {
graph [label="feed control",
color="#909090"];
node [style=filled];
controller;
admin;
controller -> admin [color="#aaaaaa",dir=both]
}
subgraph cluster_iqport {
graph [label="iqport",
color="#909090"];
adminport [shape=egg,color="#cc11cc22",fontcolor="#101010"];
marketport [shape=egg,color="#cc11cc22",fontcolor="#101010"];
lookupport [shape=egg,color="#cc11cc22",fontcolor="#101010"];
}
admin -> adminport [color="#aaaaaa",dir=both];
controller -> marketport [color="#aaaaaa", dir=both];
controller -> lookupport [color="#aaaaaa", dir=both];
controller -> stdin [color="#aaaaaa", dir=back]
controller -> stdout [color="#aaaaaa"]
}Starting with the feed component graph above:
- [ ] turn this into a list of ringbuffers and nodes
- [ ] use these lists to construct the iqfeed control
- things that need to be (step 1)
- a logon process
- a delay if we are logging on
- an admin listener and logger
- an admin writer
- an IO to start the process
- an IO to stop the process
- a level1 event feed listener and logger
- a level1 event feed writer
- a level2 event feed listener and logger
- a level2 event feed writer
- a history lookerupperer writer
- a history lookerupperer listener and logger
- code that decides what is written to each port
- a place to log to + a stamp
- a Controller (a big ? on this) that isn’t main
- abstractions
- these are all threads
- portWriter
- portReader (portListener)
- portLogger (is this a specialisation of portReader)
- state information
- is iqconnect up and are we logged on?
- interrogation of system threads?
- is there anything spitting out of admin port?
- which processes are running? Do we need to know this?
- promises baked in to portWriter etc (?)
- do processes need to register with a Controller?
- is iqconnect up and are we logged on?
- things that need to be (step 2):
- login
- starts (restarts) iqconnect
- is encapsulated entirely within the manager(?) In other words, it will only be invoked by the manager when admin port throws an error.
- admin
- an admin portReader that
- decides if iqconnect is logged on (how? - via port error)
- relogs if the connection is bung (up to a limit)
- delays other threads until relogged
- listens forever
- decides when to stop
- decides when to start
- stops and starts other processes (?)
- a main that
- starts admin
- starts ALL portReaders
- starts the worker pool
- listens for a stop instruction (IO)
- stops everything
- decides what to write to ports
- listens on STDIN
- a feed portReader that
- stays connected to the feed port
- sleeps if port error
- logs all STDOUT
- same for ALL ports
- a feed portWriter that
- sets the protocol
- changes the ticker list
- asks for news, fundamentals etc
- does other stuff? - check iq feed API
- a history portWriter that
- requests information (ticker, time range)
- an admin portWriter that
- a lookup portWriter that
- a worker pool (STM) for all threads that
- provides a new thread
- stop/starts threads (?)
- monitors threads (?)
- login
- heirarchy (small to big)
- logon
- portReader
- portLogger
- portWriter
- historyWriter
- feedWriter
- level2Writer
- workerPool
- manager (connection management)
- main
There are 4 main communication points to iqfeed:
Level1Port 5009 Streaming Level 1 Data and News Level2Port 9200 Streaming Market Depth and NASDAQ Level 2 Data LookupPort 9100 Historical Data, Symbol Lookup, News Lookup, and Chains Lookup information AdminPort 9300 Connection data and management.
More information can be obtained at DTN IQFeed Developer Area or https://www.iqfeed.net/dev/main.cfm (for a price).
iqfeed is available for download via http://www.iqfeed.net/index.cfm?displayaction=support§ion=download
Personally, my development environment is on a mac so I need to start and manage the process via wine.
From the command line:
For the demo product (delayed feed):
wine "Z:\\Users\\tonyday\\wine\\iqfeed\\iqconnect.exe" -product IQFEED_DEMO -version 1nc localhost 5009For a live account:
wine "Z:\\Users\\tonyday\\wine\\iqfeed\\iqconnect.exe" ‑product yourproductid ‑version 0.1 ‑login yourlogin ‑password yourpassword -autoconnect -savelogininfoProtocol Buffers FAST ActiveQuant Google Protocol Buffer example
http://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-zeromq-or
http://wiki.msgpack.org/pages/viewpage.action?pageId=1081387
http://kenai.com/downloads/javafx-sam/EventProcessinginAction.pdf
http://coffeeonesugar.wordpress.com/2009/07/21/getting-started-with-esper-in-5- minutes/
https://news.ycombinator.com/item?id=5550930
http://prof7bit.github.io/goxtool/
https://github.com/fmstephe/matching_engine
https://github.com/brotchie/ib-zmq
https://github.com/brotchie/r-zerotws
http://hackage.haskell.org/package/HLearn-algebra
http://bitcoin.clarkmoody.com/
The disruptor scheme is at the center of many peoples thinking about low-latency and process design for HFT, and the benchmark for cutting-edge low-latency.
The key concepts (and this is a lame list to get started - please read the references) include:
- the disruptor is a queue designed with the idea that queues are most often empty or full. The queue is a cyclical array (called a ring buffer) with a buffer size equal to a power of 2.
- (often) having a single writer (called producer) to avoid cache line contention
- multiple readers (consumers) that have dependency relationships with each other with these dependencies known to the queue.
- checks (memory barriers) simply guarding queue over- and under- runs (producers getting too far ahead or consumers being too fast)
- polling information (on producers and consumers) via the queue. Recent post og the groups suggest consumer polling might be better.
- achieving lock-free concurrency via producer separation and consumer tracking (called the consumer cursor)
- avoids (L2?) cache misses with variable padding
- in recent times, the ‘worker pool’ is leading to large performance gains (I don’t know what it is exactly - just repeating the group discussions)
I’d be interested in feedback on porting these concepts to haskell:
- I don’t see anything in hackage remotely similar, but I do see plenty of comparable concepts like cache-miss avoidance, STM worker pools etc etc.
- this would provide quite a good reference for how much slower haskell is compared with imperative-style methods.
- (for me), the translation of the concepts to haskell is difficult. So much of the disruptor design pattern is imperative (do this exactly given system architecture). Thinking in terms of promises and ‘what is’ may lead to a very different look and feel, and good insight as to the differences haskell brings to the problem domain.
- I suspect that haskell could start to perform very well when there are multiple writers (producers) and when there are a large number of interdependent readers (consumers). I have zero proof of this - just a gut feel.
- if there is a need to get closer to the metal than haskell can offer, it’s best to find this out now and understand why. In this case I imagine the disruptor as a language-independent messaging solution superior to Protocol Buffers.
A few references to help:
Active info/code hubs
Main project https://github.com/LMAX-Exchange/disruptor http://lmax-exchange.github.io/disruptor/
Collections https://github.com/LMAX-Exchange/LMAXCollections
Interesting OSS using disruptor http://storm-project.net/
Clojure Library and DSL http://userevents.github.io/phaser/
Blogs http://blogs.lmax.com/
Martin Thompson blog http://mechanical-sympathy.blogspot.co.uk/
Michael Barker blog http://bad-concurrency.blogspot.com.au
Less active/reference
https://github.com/fsaintjacques/disruptor– Performance in C++ is way down compared to main java repo.
Main technical Doc (2 years old now) https://github.com/LMAX-Exchange/disruptor/tree/master/docs
trisha’s posts
http://mechanitis.blogspot.com.au/2011/06/dissecting-disruptor-whats-so-special.html
Original Martin Fowler piece Martin Fowler
