How To Get The Most Out Of U.S. Equity Market Data

The full research article is freely available on https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3984827.

Starting in 2015, a simple insight can immensely improve the quality of tick-by-tick trade and quote data of U.S. equities such as the NYSE TAQ data (including WRDS TAQ), data from Polygon.io, Tickdata.com, and other data products sourced from the public consolidated data feeds. With the insight, it is now possible to discern in U.S. equity market data, which trades triggered which quote updates and which trades were triggered by the same order execution . This unveils the correct sequence of trades and quotes (important for trade signing, identifying the prevailing NBBO, and much more) and reveals information otherwise only observable in the much more expensive proprietary data, such as which trades were executed against hidden liquidity.

The Problem

A well-known problem with U.S. equity tick data is that it is available in two tiers.[2][3] It can either be obtained from the public data feed operated by a SIP that consolidates the market data from all exchanges, or alternatively, it can be obtained directly from each exchange’s proprietary data feed at a significant premium.

Most prominently, the proprietary data is superior to the public data because high frequency traders can use it to attain market data earlier by avoiding latency induced in centralized consolidation (see [1][4][5][6]). But this is not the only way in which public data is inferior to proprietary data. For researchers and practitioners alike, it is problematic that it is not declared in public market data, which trades and quotes were triggered in the execution of the same marketable order.

Trades are often not solitary events. A marketable order is commonly matched with and executed against multiple resting limit orders. In fact, the majority of trades on Nasdaq occur with other trades in the execution of a single marketable order. Additionally, the execution of a marketable order modifies the top-of-book if it is executed against a displayed resting limit order. The majority of trades, across all exchanges, are triggered in execution of a marketable order that also modifies the top-of-book and thus triggers quote updates.

Clearly, groups of trades and quotes triggered by marketable orders make out a sizable proportion of U.S. equity tick-by-tick data, but how can we tell, which trades and quotes belong together?

Before we can answer this question, we need to briefly cover the two timestamps available in the market data today:

Timestamps in U.S. Equity Market Data

In the public market data and data derived thereof, the primary timestamp of a transaction or a top-of-book quote update is the SIP timestamp. This timestamp reflects the point in time, the market event is reported on the public data feed after the market event is transmitted from exchange to SIP and processed at the SIP. In 2015, the Securities and Exchange Commission (SEC) ruled that a secondary timestamp, the participant timestamp reflecting the point in time a transaction or a top-of-book quote update occurs at an exchange (the exchange’s matching engine, to be precise), is to be included in public U.S. equity market data. This timestamp is fundamentally different because it is not polluted by the stochastic transmission and processing latency (aka dissemination latency) but reflects instead the exact point in time, a market event originated at an exchange’s matching engine.

The Insight

Now to the insight: all trades and quotes from the same execution of a marketable order receive the same participant timestamp (but not SIP timestamp) . The participant timestamp can be used reliably to identify and regroup these events. It is reported in microsecond to nanosecond resolution, that is, with 6 to 9 decimals to the second. At the same time, the number of trades and top-of-book quote updates is (per exchange and stock) virtually always low enough that no two marketable orders could receive the same microsecond or nanosecond participant timestamp.

How to Use the Insight

Let me provide two examples to illustrate, why this finding is important:

Matching Trades with Quotes
We often need to identify reference quotes. For this, we commonly use the NBBO prevailing before trades (in force right before order execution). Sometimes, we use the prevailing BBO at the same exchange on which the trade is executed. Identifying the quotes prevailing before trades in SIP time is not trivial, because trade and quote time is not perfectly synched (dissemination latency differs between trades and quotes). In former times, trade time lagged behind quote time by about 5 seconds.[7] Today, the lag is at around 7 microseconds much, much lower. Notwithstanding, exactly because so many quote updates occur at the same time as trades, any lag is a bad. The majority of those quote updates that are triggered by trades receive an earlier SIP timestamp as trades. Regardless of whether we use the BBO or NBBO, the prevailing quotes are often incorrect if we use the SIP timestamp. They simply frequently do not reflect the quotes prevailing before trades but instead the quotes already impacted by the trades. We can avoid this with the participant timestamp (see article for more information).

Traded Volume
Another issue is the traded volume reported in U.S. equity data. Most often, trades are reported in the perspective of the resting limit orders (passive orders) the marketable orders (active order) are executed against. But we are often interested in the active side. Say you want to gauge the average impact on prices of “trades” of 100 shares, 1,000 shares, 10,000 shares, and so on. Regardless of whether you use SIP or participant time, the original data will overstate the price impact of “trades” of 100 shares. Larger marketable orders tend to impact prices more but are often executed against multiple resting limit orders, often of 100 shares of size. An order of 1,000 shares can be reported as 10 trades of 100 shares, and only if we first merge those trades from the passive back into the active perspective, the price impact would accurately count towards the size group of 1,000 shares. Trades are not exclusively reported in the passive perspective. The NYSE continues to report trades in the perspective of active orders.[8] It goes without saying that this complicates comparing results across exchanges. We cannot reliably use the SIP timestamp to discern, which trades are actually partly executions of marketable orders. Again, due to stochastic dissemination latency, two or more trades from the same marketable order execution virtually never receive the same SIP timestamps. We need to use the participant timestamp to align the data with our needs (see article for more information).

How We Know

How can we know that the participant timestamp accurately identifies events from the same marketable order? A starting point is the simple methodology of counting how many quote updates occur around and particularly at the exact same timestamp as trades (for each stock on each exchange). We first apply this method in primary time, that is, we count, how many quote updates have an earlier, later, or equal SIP timestamp as trades in the same security and on the same exchange.

In SIP time (Figure 1), we see…nothing spectacular, really. Across all exchanges, the proportion of trades that have the same exact SIP timestamp as quotes is about 0.05%, which is exactly the proportion we would expect to find if quote updates around our trades were uniformly distributed. That is, we have 2000 intervals, so if quote updates were uniformly distributed around trades (in this ±1000 nanosecond timeframe), we would find a proportion of approximately 1/2000 in every interval, including the one where quote timestamps equal trade timestamps.

In participant time (Figure 2), the same exercise yields a drastically different picture. Suddenly, almost all quotes that are updated in the ±1000 nanosecond timeframe around trades have exactly the same nanosecond participant timestamp as the trades. Similarly, the majority of quotes in the ±100 microsecond timeframe around trades have exactly the same microsecond participant timestamp as the trades.

Explanation of Figure 1 and 2

There is a simple explanation for the observation in Figure 1 and 2. We see troughs in Figure 2 right before and right after trades are executed at an exchange. Why are virtually no quotes updated right before and after trades and why is an enormous proportion updated at exactly the same time as trades? An exchange can only execute orders in one security sequentially. Order execution takes some time to process. What we see on Figure 2 is that while the exchange’s matching engine is busy processing an order that results in one or more trades, other orders, which could result in quote updates, cannot be processed.

If the order being processed is marketable, it results in at least one trade and can additionally update the quotes. The matching engine assigns the same participant timestamp to all trades and all quotes and eventually proceeds with the next order in the given security. The trough in quote updates before trades reflects the processing time of all steps before the matching engine assigns the participant timestamp to the trades and quotes from the marketable order execution, and the trough after reflects the processing time of all steps conducted by the matching engine after assigning the participant timestamp to all events triggered in execution.

What is important to us is the fact that while the execution is being processed, due to the sequential nature of electronic markets, no other trades and quote updates can receive the same participant timestamp in the same security and on the same exchange. In nanoseconds as well as in microseconds, this processing time is large enough to avoid almost all confusions.

Conclusion

We can conclude that we can use the participant timestamp to identify those trades and quotes that were triggered in the execution of the same marketable order. In the paper, we confirm this with proprietary data, SEC MIDAS data (which is sourced from proprietary data), and various analyses applied to NYSE TAQ data.

Read the full research article on https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3984827.

The article shows how to best leverage this insight to make U.S. equity tick-by-tick data much more accurate (for instance, we can obtain much more accurate trade signs) and much more insightful (for instance, by inferring, which trades were executed against hidden liquidity).

References:

[1] Bartlett, R. P., McCrary, J., 2019. How rigged are stock markets? Evidence from microsecond timestamps. Journal of Financial Markets 45, 37–60.

[2] Clayton, J., Redfearn, B., 2019. Equity market structure 2019: Looking back & moving forward. URL: https://www.sec.gov/news/speech/clayton-redfearn-equity-market-structure-2019

[3] Clayton, J., Redfearn, B., 2020. Modernizing U.S. equity market structure. URL: https://www.sec.gov/news/speech/clayton-redfearn-modernizing-us-equity-market-structure-2020-06-22

[4] Ding, S., Hanna, J., Hendershott, T., 2014. How slow is the NBBO? A comparison with direct exchange feeds. Financial Review 49 (2), 313–332.

[5] Easley, D., O’Hara, M., Yang, L., 2016. Differential access to price information in financial markets. Journal of Financial and Quantitative Analysis 51 (4).

[6] Hasbrouck, J., 2019. Price Discovery in High Resolution. Journal of Financial Econometrics, 1–36.

[7] Lee, C. M. C., Ready, M. J., 1991. Inferring Trade Direction from Intraday Data. The Journal of Finance 46 (2).

[8] SEC, March 2014. Order Book Reporting Methods and Their Impact on Some Market Activity Measures. URL:

https://www.sec.gov/marketstructure/research/highlight-2014-03.html