Using Alternative Data in a Backtest

Michael:

Alright, everyone. Welcome to another episode of Line Your Own Pockets. A completely selfish episode today from my behalf, because I have had this question to Dave and something that I want to explore and kind of figure out for a long time. And it was just one of those things that we might as well make a if we're gonna talk about something, sometimes it just turns into, let's just make it a podcast topic at this point. So now you guys get to see why I grill Dave on what it is that we're gonna about.

Dave:

Yeah. So I think what you or what we talked about in the green room there is adding third party data to your backtest. So let's say let you I'll be interested to hear what data sources in particular you're referring to. But yeah, depending on the data source, there's a lot of factors that go into it. There's easy ways to add the data.

Dave:

There's hard ways to add the data. There's ways that slow down your process a lot. There's ways that you can do it that don't slow it down. I'm very interested to hear the data source and I'm sure I'll have lots of questions about what you want to do with it exactly.

Michael:

All right. So we'll get into the specifics, but the main thing for kind of everybody and the reason for this is that a lot of places out there in which you get data, you get high, low, open, close volume data, and that's kind of the basis of everything. And that's currently what I'm getting. Use Norgate as a subscriber, because they have survivorship bias, which is basically means if I'm running because I'm doing a lot more long term trading. If I'm running a mean reversion strategy that's buying stuff that's dipping and holding it for sometimes weeks, you know, I need to have Bed Bath and Beyond and you know, name XYZ company in there.

Michael:

So what I'm looking to do is I'm looking to add alternative data sources, which is to say, you know, things that are outside of things that can either be either high, low, open, close data or things that could be derived from high, low, open, close data, which, you know, is tactical indicators and, you know, position ranges and things like that. So that's that's the main topic is, you know, getting something that's just completely outside of what you're doing that you think may have some predictive value.

Dave:

Yeah. That's a good overview there. Just just for other listeners to to sort of show the difference between Michael and myself. I use data feeds also. Use some different ones.

Dave:

I use Polygon for getting minute bar data, and I use IQ Feed for getting real time data. And you can also get historical bar data IQ Feed. There's a variety of different ways to ingest that data, include it in your tools. I work with a lot of traders to get that process into Amibroker. But yeah, fundamentally, that's the basis for all these backtests.

Dave:

Sort of the basic fundamental building block of any backtest is your data source. And what we're talking about here is additional stuff that could be really interesting to add as a column to optimize. I mean, There's all sorts of proprietary data sources and some of them are very interesting and it's worth looking at those. But you can get caught up and down a bit of a rabbit hole getting it into your backtest in the right way.

Michael:

Right. So so let's just talk about alternative data just to begin with, because I'd be kind of interested in your ideas behind this. So just for definitions, when I say alternative data, I just mean anything again, that's not price data and cannot be derived from price data. So, you know, a lot of stories that people probably heard of back in the days, you know, using credit card data to go through things. Remember, I guess a really good example is what they called the odds lot indicators, which happened, you know, in like the seventies or so, where people would look at the tape.

Michael:

And if you were a large institution before, you know, HFTs and and hiding your orders, if you were an institution, you transacted in round lots, which was 100 to 1,000 to 10,000 to whatever. And then odd lots were people who traded, you know, 25 shares of a company or 153. And what was happening with these people is they were essentially just applying dollar amounts like buy me, you know, dollars 5,000 of this company. And a really good indicator that worked for a long time, and a lot of people made millions and millions of dollars was looking at this odd lot indicator as the dumb money, right? So I'm going to, whatever stock they're buying, I'm going to sell whatever stock they're selling, I'm going to buy.

Michael:

And a lot of good indicators were made from that. So just, I guess, I know you're going say it depends on the data source, but what do you think of, and have you ever used these kind of alternative data, you know, off price data sets before? And did you find any success with them?

Dave:

Sure. I mean, the first one that comes to mind is earnings dates. That's not derived from open, high, low, close volume. It's right in trade ideas, so you can use a filter for earnings dates right in there, which is really convenient. But outside of trade ideas, that data is kind of hard to come by.

Dave:

Not intuitively have to do a little bit of work to get the data source itself. That's one that comes to mind and that's very valuable. I mean, there's strategies that I have that will only trade if the stock has released earnings today before the close or yesterday after the close. There's also strategies I have that specifically exclude that scenario. That's a really important one.

Dave:

And I want to mention something that you might think of as alternative data, but actually most platforms, including Amibroker, you can include it right away. One factor you might consider is data based on the tick or the trend. These overall indicators that the markets Right. Right. There's special symbols that you can use to get basically open, high, low, close data from these special indicators as if they're a real symbol.

Dave:

You can't trade them, but you can pull them, you can ingest them into your backtesting and include them as a column. I do that for several. The tick is one, the trend. I'll put these in the show notes. I'll go back and look.

Dave:

Another one would be like the SPY. Let's say you're trading a regular strategy that has a whole bunch of symbols. You may wanna include data from the SPY in that backtest as a column to optimize on. So you can easily do that right within Amber Broker. I'm sure you could probably do that within RealTest also.

Dave:

That is not I don't consider that alternate or alternative data because you can get it through your API without much trouble at all.

Michael:

Yeah. And that so for those who don't know, just the the tick and the trim are essentially how many stocks in the market itself are making new highs or or new lows. And sometimes those are those can be kind of short, or how many stocks are are printing on the bid versus the ask. All these things are ways to dive, I guess, under the hood of the market and to get in there and to say, okay. You know, there is a move here or there.

Michael:

And actually, reminds me when you say that when I used to trade prop for a living, we had a squawk box, and it was just a guy who actually was sitting on the floor of the exchange and just yelling out the orders of that people were were doing back when the floor of the exchange was busy. And one of the rules that I had is that if it was quiet in the background, if no one was yelling behind him, I wouldn't take the trade. And only when you would start to hear him getting exciting, people get around. And that was kind of, I guess, the analog version of some of this data because it would say, okay, yeah, people are getting really excited. That means there's probably something going on that's making them buy or sell.

Michael:

So it's it's interesting that that when you brought that up, thought of that right away because yeah, it's like old school alternative data.

Dave:

Well, you said it's analog. First thing I thought of, let's put a decimal meter up there, keep track

Michael:

of that.

Dave:

Database that data over time and see if it really makes a difference. That's interesting. I like that.

Michael:

It was fun. But yeah, so, okay, let's go to kind of an actual example. So we know why is we're looking for. The main thing that I'm wondering with alternative data is that, is there any edge? So what my plan was, was to be able to ingest this data, and then run a simple strategy.

Michael:

I'm thinking mean reversion for this. So I and I'll tell the audience my whole kind of thought process here, because I don't know if it's a good idea or not. So if it's a great idea, you're welcome. If it's a bad idea, I'm sorry, I wasted your time. But the idea is simply this, in which the quite often, there's a strategy when it comes to earnings reports and major news events, where the option sellers, I always just meme on Twitter, the option sellers always win.

Michael:

Right? So you have a stock is about to report earnings. Quite option, there's what's called the implied volatility move. Right? So the market makers are saying that Tesla or Nvidia for the last earnings was a great example.

Michael:

If you were buying a call or buying a put option, you needed a larger than 6% move to break even on that after the earnings date. Right? So what a lot of people will do, and there's a lot of you can read all about this if you're interested. There's a big, big studies on doing it, is they sell both. They sell a call option, they sell a put option, and you're not making the bet of anything occurring in NVIDIA other than the move is going to be smaller than is currently priced in.

Michael:

And it's like 80 something percent of the time that occurs. Every now and then you get slapped. So it's a it's a good equity curve over time, as long as you're, you know, managing risk. It's kind of like shorting low flow penny stocks, right? You're gonna be right a lot of times, and every now and then if you're not careful, you're just gonna get smoked.

Michael:

So that is the primary idea behind it. So what I was thinking of doing is kind of building a strategy in which I ingest the this implied move that the options market makers are putting in, and then apply that to normal price action. So what if there is a gap outside of that move? What if there's a move outside of that move? Is that a mean reversion idea?

Michael:

So you can see how the the the crux of it comes from this external data source, which I don't currently have act act access to. I've seen other people's back test to it, and they look kind of interesting. And my plan was to get that from IB. So that's the that's the idea behind the trade, that's and basically as far as I've gotten with it, because I haven't been able to ingest the data yet.

Dave:

Yeah. Okay. So the first thing that I want people to think about here is there's a difference between backtesting with data and then actually being able to use it in real time. So imagine you go down this rabbit hole, you get access to the data, you can include it in the backtest. It looks great.

Dave:

You're ready. You've got this rule that you apply. You're ready to kill it. And then you go to back to or to trade it in real time and you Where do you get the data? Can you get it fast enough in a way that matches your backtest?

Dave:

That's a big question. A lot of times that you can't do that. That's the first thing I would want to answer before I went down this rabbit hole of including in my backtest at all. Can I and I'm not sure I'm going to be able to get this data in time to actually make decisions with it in real time?

Michael:

And that's, you know, I already knew that I could right before I went in. But that's a good thing to bring up to the audience is that yeah, if you can't, you know, there's the rate at which data comes in. And for that, one of the things I was thinking with short float data, right, is that there's two different, if you ever look at short float data, there's two different data points that come in. It's what is the short float now? And then when was that reported?

Michael:

Because it's supposed to be reported every two weeks, but it's notoriously just awful data point. So in this case, yes, the beauty of it is that the game plan is that I would have a list of what all of the implied moves for kind of every stock in the market would be at the close of day one, And then you're looking for the next day in order to take the trade. So as long as I kind of offset the data that one day is like what was the closing implied move and the closing price, and then what's the opening price the next day? And you know, keeping that other one stagnant. That is that's kind of what I'm looking for, because see, this is primarily a gap play, right?

Michael:

Again, is there is there any predictive value to saying, hey, if it's gapped outside of that range, there's more of a chance of a mean reversion than if it's gapped inside of that range. So yes, very important to know, but yeah, I've already thought about that one with this.

Dave:

Okay. So even if you didn't have access to it or you didn't, it was a question about whether you have access to it in real time, Sometimes it is worth proceeding and including it at a back test because when you look at a back test and you see that this data improves it dramatically, it's often very motivating to jump through a bunch of hoops. Like whatever hoops are there, I'm jumping through them to get access to that data and use it in real time.

Michael:

And for the right price, you can generally find if the data point exists, the question is, right, yeah, how big of that is a hurdle to jump through? And if, if it's, you know, if you look at it and you say, I could make $100,000 a year in this strategy, well, maybe it's going to cost you twenty-thirty grand to get the data, but it might it becomes a cost benefit analysis when you have some understanding of the dollar amount that you could get in edge for that data.

Dave:

Yeah. And a lot of data vendors that are selling proprietary data like this, they'll give you the data to test with for a period. I mean, it's totally in their best interest. Even if they're selling the data, they will often give you a snapshot of it or even like all the current snapshot of the data. So you can include and test it and see for this exact reason.

Dave:

Like it's going to work, all of a sudden you're willing to potentially pay a lot of money to get access to it and it could be well worth it. Could be very expensive, but it still might be well worth it. So the other question I have here is, you might be able to get the current snapshot of the data. But depending on the nature of the data, it might not be that valuable to be able to include it in a backtest because depending on how quickly the data changes, if you have one day of data that you have access to, it might take several days, weeks, months, years to gather enough of that data to include in a backtest in a way that really makes sense.

Michael:

So

Dave:

It might make sense to start archiving this data now if that's the case. If you can't get historical access to this, you can start archiving it automatically each day so that the data starts piling up and every day that goes by, your back test potentially gets more and more valuable because you can include that. But then you know, that takes time to happen. Is this something that you do have access to historically?

Michael:

Yeah, well, lot of what I'm because I haven't done it yet. A lot of what I'm going by right now is articles that I've read that people do it. And I guess via API, you can pull out historical data for Interactive Brokers. Now for, I believe, Russell 3,000 stocks. So not everything.

Michael:

But the thing is not everything's gonna have options associated with it or liquid options associated with it anyway. So this isn't a data point you could use for all securities, only obviously securities that are optionable. So the problems that I've seen with the data, which I don't think because it's probably going to be a a single day to multiple day, if that strategy is survivorship bias doesn't seem to be there. Right? So you can go and you can ask Interactive Brokers, and they can pull in this closing basis data for you.

Michael:

But as soon as a company gets delisted in some way, then it's it's not going to be which again, I don't that that only becomes a problem, I think, the further back you go in time. And and that's not my

Dave:

point But also with your real test data, you already have that problem solved. So I think if you are able to get that included in your back test, I think the survivorship problem solves itself because you've already sort of solved it with your NOR gate data, correct?

Michael:

Well, no. So let's use the Bed Bath and Beyond again, for example, right? If I'm doing this kind of mean reversion strategy on that, because I couldn't have the implied volatility data, it will just be left out of the data

Dave:

set, right? So So this data point you're imagining is going to be crucial to this creating the signal for the strategy.

Michael:

The idea is, yeah, it's it's the entire it's the entire point of the strategy, which I think it's one of those if you're going to use alternative data points and you're going to go through the work to do it, it should you should at least think it's going to be useful. And again, I'm going in here with the the idea that I don't know, but I want to test and which I think should be part of it as well. So the, you know, I it should be a fairly simple question to answer if I can get the data, because all I'll do is I'll say, okay, what's the probability that a gap mean reverts over a number of days? And then what's the probability of a gap outside of its implied volatility mean reverts in a couple of days?

Dave:

And if

Michael:

there's a difference there.

Dave:

So what I would do there is I would create a backtest with just the gap data, ignoring the implied volatility stuff, and use that as a baseline. And then figure out how to incorporate the implied volatility stuff as a column. And then you get to use I mean, then your survivorship problem does go away because you're using the base gap strategy as sort of your anchor. I mean, that's the basic foundation of the strategy. Then you're layering on this implied volatility data to see if it makes a difference.

Dave:

So

Michael:

let's just say, okay, I'm able to get the data. I'm able to implement it live when it happens, I'm able to get it for a a far enough back point that is interesting to me. And again, that'd be, you know, five five, ten years maybe for a strategy that is either intraday or or multiple days, right, that I want to test, right, holding for five days or something like that. How is best? Like, what tools do you think it'd be best to amend one data point to another outside of trading?

Michael:

And this is again, please curb that answer for me, which is somebody who doesn't, I know from like a SQL or Python kind of thing, it'd probably be easy, but I the less chat GBT can really hold my hand through it. That has been the big worries to me is, okay, I get the data. So I have from Norgate high low open close data for every stock going back to 1950. And then I have this data for five years and how best to marry those in a way that I'm confident that they're correct.

Dave:

This is the critical step that I think people get confused by. There's the really hard way to do it, which most people, I think, are drawn to. But then there's an easier way to do it that is the way my brain thinks. Your first instinct might be, okay, let modify RealTest or do some fancy stuff in RealTest to include the data there. Or in Ambroker, there's like an Ambroker, there's a ODBC interface, which is like a SQL interface that you can plug into.

Dave:

You could do all this fancy stuff to bring data in. All that I think is kind of the hard way. And the way I would do this is create a backtest as you normally would with no alternative data included. Create that basic backtest. You've got your CSV file.

Dave:

That's just a text file with columns and rows, basically a spreadsheet. It's in a very extensible format at that point. So it's beyond the point where you've done the backtest. Now you're you you can have the stage where you can add alternative data to that CSV. So what I would do is have a c s you know, get the data in a format that you can deal with the text format CSV.

Dave:

And then anybody can create a Python script with ChatGPT to do this exact this this very specific task that you want to do. You want to basically take your spreadsheet that comes from your backtest and say, okay, I want to add a column from this other file lined up using these two fields as a key. You're saying, okay, Symbol anything and

Michael:

date, I guess would be.

Dave:

Symbol and date, yep. And you might wanna go through So ChatGPT can very easily write you a script for that. It's very easy to test, make copies of these files so you can see because you know exactly what you want to do. Want create a column from this. It's very easy to see.

Dave:

Text files are so easy to deal with. It's easy to copy and make sure things are working as expected. You can have ChattyPety write the little Python script to do it. You can run it, you can verify. It's a little bit tedious, but it's totally doable by somebody that's not a programmer.

Michael:

Okay. So that that was my initial thought. And the actually didn't think of doing the backtest first and then send it out. I was thinking the other way of working with the raw file, and then and then testing it. But I guess for that initial test, it would be fairly simple to go through and say, okay, again, just like I mentioned, what's the probability that a gap fills, and then what's the probability that, you know, a gap outside of this number fills?

Michael:

And then, you know, would could you create fairly simply some sort of, I say, or scheduled process in which it would do it periodically? Because part of it is getting the data in for the initial backtest and seeing if it's something that makes sense. But then the other thing is the, in my case, generally a weekly to monthly reconciliation to say, these are the trades taken versus these are the trades to make sure everything is kind of lining up correctly there?

Dave:

Yeah. So that first step is the easier step. Your basic backtesting functionality doesn't change. It's a step beyond that where you're merging the data. Then you would go to optimize it.

Dave:

You would run that data through the Strategy Cruncher to see, okay, does this alternate data that's in a column now, does it make a difference? Once you decide, okay, yeah, it makes a huge difference. I go, wow, okay, I got to get this in real time. Then I would go and explore real tests to see how I could include it there at that step when you're generating your signals. Then if you're using GammaBraker, okay, how can I get that data in real time so that I can generate the signal exactly when I want to based on the data that I've optimized for?

Dave:

You're proving to yourself that, okay, yeah, this is actually worth it. I'm gonna take the steps now to make this happen in real time, because that's more complicated. Know, anytime you're working with real time data, there's just more complications. You know, you have to Yeah. It's not worth it if it's too delayed or it takes too long or you can't get the data in time.

Dave:

There's a lot of hoops to jump through there. But first, say, Okay, is this worth doing? And you can do that without much modification to your process.

Michael:

So it's interesting. Yeah. And it makes perfect sense, right? You just every step along the way you're saying, is this worth my time and energy to? Yeah.

Michael:

To to track down and pursue. And then the kind of the next question is that, okay, so I've got a, you know, I've tested it, I say, yeah, it's worthwhile, and then go from there. And then from there getting the real time data, that's all well and good. But I guess when it comes to ingesting, you know, in data like this, and then holding it and all of that, I guess part of my thing like you mentioned, is I would want to start maybe pulling it in regularly and then storing it some way alongside the NOR gate file that kind of updates every day. So ideally, sort of automated process that, right, at the end of every day it grabs the Interactive Brokers data, and it grabs the Norgate data, and then it just pops the two of those together in some sort of database that just going forward, now that I've done all the work to go back, going forward, I don't have to think about it, it's just part of what RealTest sees, it's part of what Norgate has, and then go from there.

Michael:

Is that, you know, kind of back end automation, how would be the best to look at that?

Dave:

So the script we just talked about, the ChatGPT could easily create for you to do that merge after the backtest. Yep. You can so what I would do there is that is gonna be a generally reusable script that you can use in your process. You could plug into your workflow. So most people would at that phase would maybe, you know, do some fancy footwork in Excel to sort of manually do it one time and see.

Dave:

I think it's so worth it to have ChatGPT create a script for you to do this. So it basically takes out all the manual steps that most people go through. It's like they think about, okay, this is sort of the quickest way. I know how to operate in Excel. Let me do it manually.

Dave:

The problem is, if that works, you're gonna end up doing that multiple times. Where if you go ahead and create the script, you've got this sort of like we've talked about SOPs, standard operating procedures. You already have this building block that you can reuse in your process and basically automate the entire steps going forward. If it works out and you see that the data is good and you want to include it, then you've got this script that automatically does the merge. It's essentially at this point in your column library.

Dave:

And once you see that it works for this, you can include that in your column library and in your optimization process from that point forward. You could see, okay, well, you know, it works well for this strategy, but it's very likely not going to be the only strategy where it works for you.

Michael:

Yeah, because that was going to be kind of my next question is that if the if it ends up working, I would want to like, have it part of everything you do, because you end up being because part of my theory with alternative data is that as competition, just in the markets in general increase, that there is, you know, more unexplored edge on the things that are harder to get access to, and it is fairly easy to get access to high low, open, close data for everything. So right, especially, you know, AI and computers getting smarter and all this kind of thing is that the harder the data is to ingest that naturally there could be some inclination of edge there. Like, is that something that you've kind of found? That the further out of the curves you go, the more edge there is?

Dave:

Totally. I mean, yeah, you should always be thinking about these sort of data sources. I think it's really important to figure out what your personal edge is. And a large part of what I think about for my personal edge is, Okay, how can I systematically backtest something that is hard to do? And that takes a lot of resources.

Dave:

It's very easy to find a backtest or even TradingView freaking can do a backtest on one symbol. Yep. That's not very valuable. Anybody can do that. But to backtest across the entire market, like one strategy across thousands of symbols, there's only a few people that can do that.

Dave:

Your competition is whittled down just by virtue of this is the world you're operating in. Taking that and having a good process for including alternative data is really valuable and that puts you that that whittles your competition down even more. I totally think that's a great way to think about it.

Michael:

And that's yeah, I was fairly sure you'd agree with me there, but that was and that was for the audience too, is that, you know, some people are out there thinking, well, you know, with AI coming and with, you know, access to all these tools, will the systematic space become more crowded than it is now? I think no, for a couple reasons. One is just I don't think a lot of people are suited to the leaving things alone, which is hilarious that that's I think that's the barrier for entry for a lot of people is be able to come up with a system and then just not intervene. Yeah. I think it's hard.

Michael:

And then the other thing is the amount of data sources that are out there. So I just pulled one from an article that I read, but, you know, there's tons of different when it comes to a sediment data. One thing that's really interesting to me, it needs to exist a little bit longer, but dealing with the they're calling it it's not gambling, it's prediction markets. That's the fancy name they're putting around where you can bet bet on the probability that like Jesus comes back this year or bet on anything that, you know, is there a, you know, predictive thing in the crowds of, you know, when it comes to, I don't know, oil prices, or Fed decisions, or something like that. So there's not only a lot of different data source out there, there's a lot of different data sources being created all of the time.

Michael:

So definitely kind of expand your thinking out to beyond, I mean, high low open close, and RSIs, and MACDs, and this, and what can you do that's just a little bit different from everyone else. Cause even if it's not hugely different, and it's still a mean reversion strategy. But if it's a mean reversion strategy based off something a little bit different than what other people are doing, well, right, may be edge there to exploit.

Dave:

Totally. And it's not, you know, with all these sources and you know, your unique experience, your unique intuition about the market, it's not as hard as you think to be completely unique in the entire world, in the trading world, by doing exactly this kind of thing. And when you have these new data sources coming online, typically the earlier, the more edge there's going to be in them. So that's another incentive to sort of be aware of the landscape and have your process ready for testing these things. And the process we just described, adding a column to your backtest after the backtest occurs, merging the CSV, there's nothing unique about the alternative thing that we're talking about here, this data point.

Dave:

That process, you can reuse that for any alternative data you want. If you can distill it down to that, you've got this reusable process that can be used from now forever for any alternative data.

Michael:

Yeah. And, you know, it was funny not to date the podcast too much, but before this, I sent Dave an article that the Nasdaq is getting into tokenization of all their securities by the end of next year. So that's gonna be a whole different different source. And I kind of worry that some of the data sources that we have may be slow on the draw when it comes to, you know, these twenty four hour markets and things like that, or, you know, look at whatever platform you're doing and say, do I think these guys are going to be really quick in in becoming twenty four hours, which in my head is just a different data source. Maybe you're not planning on trading because there might not be a lot of liquidity outside of those normal market hours for a while, but that to me is a different data source.

Michael:

You could almost parse this is the range that the stock had or the high low open close overnight when it's just, you know, degens and Robinhood traders trading it. And is there some predictive value there that I can imply to this this other side? So, yeah, I again, I appreciate the conversation. I think this was a powerful one, because it's kind of where I'm, I think I'm headed next. And I think there's I think there'll be a lot of listeners that will be kind of in the same way where maybe if you're stuck on we always talk about and I think, you know, the quote you give is perfect where it's, you know, as soon as you the hardest thing is not your first strategy, it's your second one.

Michael:

It's like, if you're stuck, maybe as soon as you come and start to think about sources outside of everything else, you've opened up a whole another door that leads in a whole another direction where Yeah. Just like everything it seems like in trading, it's a rabbit hole with with absolutely no end.

Dave:

Yeah. So I've there's there's one more point I want to mention here. So one thing I want to mention is we did a whole series on news. Like we did a whole episode on news. You should go back and look at that.

Dave:

News is an alternative data source. It's exactly the kind of thing we're talking about. And we talked about, I think, two specific ways to use news that draws exact It's exactly this sort of thing. You're bringing in an alternative data source. News is different because you got a whole bunch of different articles potentially.

Dave:

So we talked about some good ways to distill that into a column in a way that is really So that's one thing I want to say. The other thing is I want to point out that not all data is the same. One thing you want to think about is how often does the data change? If can only get a current snapshot of the data, it still might be very valuable to backtest even across many years potentially, if it doesn't change that much, because it's going to be valid right now, for sure. If it doesn't change that much, it still may be valid for many years back.

Dave:

And the value sort of deteriorates the further you go back because it's not going to be quite as accurate. But still, you could backtest something now, include something that doesn't change very often, and you don't have to historically database that or get a historical feed for it. It'd be perfect if you could, but sometimes you just can't get that data. But that doesn't mean it's not valuable and that doesn't mean you can't include it in a very long backtest.

Michael:

Well, it's funny. I'm glad you brought up that podcast because even though I was there, I forgot about that one as well. And there were some some really good points there about turning things that are, generally speaking, not numeric or or binary or or any way they can express and and hammering those down. So the example I gave will be very simple because that that expected move will just be a price value, and I can compare that other price values. But, yeah, if you're thinking of something while we're talking about this and you said, oh, yeah, but it's, you know, too subjective or, you know, not in in a numeric way that I could test.

Michael:

Well, again, go back and listen to that episode because there are ways you can turn things that are are more abstract and and your best kind of force them into something numeric that you can then kind of back test again. So the, you know, the example being, you know, earnings beat or miss or something like that. There are ways that you could transform that data into something that, again, could be could be tested and could be kind of played with in in creating an alternative dataset.

Dave:

Yeah. That's a bit of an art, but you you'll get better at it over time. And and, you know, the better you are at doing that, the better off you're going to be. And I get really excited about this stuff because the bar is so low for improvement. If you can find a data source, if you can find another column from your comm library that just moves the needle a little bit, just makes your equity curve smoother, that can have huge ramifications for your trading.

Dave:

Mean, if you have a data point that doesn't improve the total profit or the profit factor for your strategy, but it smooths it a lot, that could be that's something you could actually scale. Mhmm. So it doesn't have to move the needle a bunch to have a hoot a huge potential impact on your trading.

Michael:

Well, and that's kind of what I expect. And maybe, you know, when I'm done this process, we'll we'll chat about it in another episode. That's kind of what I expect with this data is that it's not going to change everything. It's not going to like, don't think this is going to be a dataset that's going to make a strategy that no one has ever heard of before, and it's more amazing than everyone. What I'm looking for is exactly kind of what you mentioned is does it take a normal mean reversion strategy and just filter out some of the junk in order to kind of smooth returns a little bit.

Michael:

And then, yeah, you could either, you know, leverage it up or something, or you could maybe even loosen up other columns and allow more more in depending on, you know, how much it it changes that that side of things. So and then if I get good at this, you know, and and have a process for adding this alternative data, then, you know, it might be the combination of that dataset that I'm gonna start bringing in plus another dataset that I may not even thought about yet. And combining those two end up end up making a lot of sense. So again, main takeaway I want for listeners is that if you're going to, you know, everyone is using high, low, and close data, and the second you bring in one external data source, it just increases the amount of opportunities and the amount of things you have exponentially. And then once you start doing that and adding more and more data sources, then it just makes it so that the the it's an infinite number of possibilities that you could you could create strategies based off of.

Dave:

Yeah. And I can't emphasize enough, like, it's really important as a trader to be unique, think of unique ideas. But it's also not as hard as people think. And this is exactly what you're talking about. Like, you described there would be a strategy that nobody else has thought about.

Dave:

Once you include your personal column library and your rule set, you would be unique and you would have this edge that literally nobody has. Yeah. So it's exciting.

Michael:

Yeah. If you yeah. So if you're interested again, I'm sure I'll follow-up it at some point. But this was something I think that was on the dock for a while, and I'm I'm excited to get going with. And I I hope it helped kind of open up your guys' minds too.

Michael:

There's there's way more out there than just, again, the the standard RSI MACD moving average crossover, that type of thing. So and I just want to say thank you to Dave because I know I've been hounding you about this for a little bit. So I'm glad we finally were able to chat through it a bit.

Dave:

Yeah, I think listeners will pick up some good stuff from this. Yeah, and if you have any specific questions, feel free to reach out to me over email, davedavemaybe dot com. You can tell I get excited about this kind of stuff. I mean, this is cool stuff.

Michael:

Awesome. Well, thank you guys all for tuning in. As always, I'm Michael Noss.

Dave:

And I'm Dave May. Thanks for joining us on Line Your Own Pockets.

Using Alternative Data in a Backtest
Broadcast by