Line Your Own Pockets | Transcript: A Better Way to Get Market Data

A Better Way to Get Market Data

May 18, 2026 / 50:52/E93

Michael: 00:00

Alright, everyone. Welcome back to another episode of Line Your Own Pockets. I'm just gonna kick this one off to Dave right away because he's been hard at work building something and I think he built something pretty cool. We're gonna talk. This one's gonna be a it's gonna be a nerds only one, think.

Michael: 00:16

We're gonna go a little bit into the weeds, but I know if you're listening to a podcast about systematic trading, you're probably a nerd anyway. So I'm sure you'll have a great time.

Dave: 00:26

Yeah. So we're gonna everybody loves market data, right, and dealing with market data and pipelines about market data. So that's what we're gonna talk about today. So the first thing that you need to do when you use Amibroker or use software, you know, Python, you need to get market data. And the question is how do you do that?

Dave: 00:54

And so when you start back testing, that's fundamentally different than real time trading. So you have a few more options to be able to download market data and import it into whatever tool you're going to use for back testing. So I've been doing this for, I don't know, twenty years at least. And I know that there's a couple things that I run into that are annoying as you do this over time. So there's the initial part of the backfilling, figuring out how far are you going to go back, doing the initial backfill.

Dave: 01:34

As you start back testing there with the data you've backfilled, then over time that becomes a little bit older, a little bit older and a gap forms from the last, you know, when you did your initial backfill and the data that's collected since then. And it sort of sneaks up on you because it's harder and harder to do chart reviews as the data gets older and older. It's like, it's, you're sort of more out of touch with it as the most recent day gets older and over. So it's important to keep that in sync. And it's kind of the perfect thing where you really need to process for it because it sneaks up on you that the data is older and older, you kind of put it off.

Dave: 02:21

And then by the time you get around to actually refreshing the data, you've forgotten how to do it. Enough time that's gone by, okay, well, did I do again? How did I do it? I can't remember exactly. So I've got a very good process for dealing with that that I want to go over today, with you.

Dave: 02:45

So so give me your give me your thoughts, Michael.

Michael: 02:47

Well, yeah, first of all, this is something that's very important, I think, for people who are maybe switching to Amibroker to figure out. We've talked about this before. But when I I switched over to Amibroker, I just didn't know that this would be a thing at all. And Norgate and RealTest just have they update every night, and I'm not thinking about it. And then it just kinda goes back in.

Michael: 03:11

And when I picked up Amibroker, I just didn't I just figured, oh, there'd be something I'd hit a button and then the data is there. But, like, we talked about it because it's when you go intraday, you're amplifying the data by 360 times for every single day. It gets nuts. Oh, yeah. Pre market, post market.

Michael: 03:30

Yeah. You know, things absolutely get insane. So the this process is just way more important than you think. And this was probably the first while you were there, I was sending you all caps, Slack's messages all the time. But Yeah.

Michael: 03:46

Probably like the first three weeks of my ami broker kind of journey was getting the data, getting the right kind of data in there. Different database like some databases that are small that you can run things quickly on and some database that are large when you're you're feeling confident enough in your code that you wanna go further back. And yeah, it's it's a lot to manage. And the only way I've been doing it, keeping up on this, which is probably the worst way ever, is just redoing the whole kind of database from scratch. Right?

Michael: 04:18

Just periodically clear it and just go and and re backfill the whole five years or so worth of data from the get go, and just kinda turn it on on Saturday hope it's ready to go away next time I need to use it.

Dave: 04:33

Yeah, that's the hard way.

Michael: 04:35

And

Dave: 04:35

I've gone through so many years of doing this. And so when I started doing this, the first thing I did was backfill with the IQ Feed plugin in Amabroker. And that's what I've used for I used that for some time. I remember in my daily review, going back through some trades and then comparing to the backtest, I saw some data issues. And this was years ago, and I went back and looked, and I realized that my backtesting database was missing data.

Dave: 05:17

Like, were gaps in there that were enough to where, okay, this it became a problem. It became a really high priority to fix this. And because you're backfilling thousands of symbols, The way the backfill works in Ameren Broker is it's doing a historical request for every symbol, but it also adds a level one feed that it's pulling data. So it's getting a quote feed also. So the idea is you back So it's just

Michael: 05:53

fill last bid and ask, it's pulling in.

Dave: 05:57

Yes. Okay. So it's, the idea is it's backfilling once, starting a level one feed where it's getting a trades list as they occur, and then it's building the bars and memory and sort of backfilling in real time as you go. So that's the way it's designed to work. Now, if you're, if you're only backfilling and you don't care about real time data, you're kind of forced still to do those level one messages.

Dave: 06:31

And there's, you can only have up to 500. You can buy more with IQ Feed, but by default 500 is the limit. So once you reach that limit, your data basically becomes corrupted. It does some very strange things and I can't figure out who is to blame. Amabroker or IQ Feed.

Dave: 06:54

They can't figure out who is to blame, of course the other guy's the one that's doing the wrong thing. So as I was really scaling up my trading, this was a very important issue to me. And I was like, okay, I cannot have my back testing database be corrupted, or able to be corrupted at any point if I don't push the right buttons at the exact right moment. So I needed I knew I needed a process that worked outside of Amabroker to pull the data and import it into Amabroker without a plug in connected to that database. So that's what I've set up, and I have a utility that allows you to do that very, very efficiently.

Michael: 07:46

Well, and, you know, the importance of that, because some people might say, oh, well, you know, if it's just a back testing database, who cares if, you know, stuff's a little bit messed up. But, you know, that could be the difference between a great system or or something that's broken or more importantly, it comes to optimization, you might be optimizing for like completely different standards. Right? It's saying, oh, look at all these wicked trades that happened under this condition. That condition is just a a phantom ghost inside the the data and never actually happened.

Michael: 08:18

And then, yeah, you're back testing for that thing. And even like say say the break happened somewhere in the middle, well, all indicators are are calculated off kind of a stream of data. So if you're missing two, three days here or or something there, any moving average, RSI, MACD, ATR, whatever it is, that's just kinda completely bunk at that point. And so you're breaking more than just, oh, I'm missing a little bit of data, that'll be fine. I'll kinda estimate without it.

Michael: 08:47

But anything that flows through that data is busted as well.

Dave: 08:50

Yeah. So it's a really important thing that data integrity, and I knew I needed some process to do it well. Now, the first tool I started using was a tool called Q Collector, and I was actually using eSignal at the time. So this tool would allow you to download all the historical data into CSV files for every symbol. So I mean, the other, the other pain in the butt part for the back to for the backfill in Amibroker using the plugin is, yeah, you see it go from zero to a 100%, but you don't really know that it's complete.

Dave: 09:33

Like you assume it's complete, but there, like I said, I went and found some holes in the data. So it was all of a sudden I realized I couldn't really trust that number when doing this.

Michael: 09:45

Well, and how did you because this is one of those things where it's not like, you know, you're working on a spreadsheet of your your household business expenses. Right? And and, you know, you can even kinda wrap your brain around the the data. Like, how did you do the reconciliation to find it? Or were you just kind of going about your day to day operations?

Michael: 10:05

And and that's what noticed it. Because I just realized I don't really go back and and comb through the data. I don't think really you could manually in any sort of meaningful way.

Dave: 10:17

So the only reason that I found out about it is because so I do a daily backtest connected to the real time, my real time database. But every once in a while, I'll go do a larger backtest based on my back testing database. This is like the true source of truth. And I do a comparison on that. And that's where I discovered that the back testing database had some holes in it that weren't holes in real time.

Dave: 10:51

So there was enough differences there. I was like, holy crap. I like, I've got to figure this out.

Michael: 10:56

So let's drill in on that a little let's double tap on it, whatever business I'm trying to use the podcast term, but, because some people might not know what this or or some people might not even do it. But with Amity Broker and with these databases, you gotta think of it as that a short term database is gonna be relatively quick. And you can run tests on it relatively quick and and you can even use that as as kind of your real time or whatever. And, you know, the process that Dave's describing is it kind of the end of every day, you could say, here's the trades I actually took and then run a little back test on the smallest database that will allow all of your indicators and and everything to to calculate correctly and say this is what what actually took place. And the reason you do that on a on a different database, not like a master database, is the speed of it.

Michael: 11:44

Right?

Dave: 11:44

It's Yeah.

Michael: 11:45

Less data to to compute. And then you have kind of the big boy over here, which you called the source of truth and that's because it it it doesn't connect to the market very often. It is a huge amount of data that's usually left dormant. And then you when you are really interested in learning more, you go and you you tap that database because that might have five, ten years of data in it and is just a longer term look back. But because it's that big of a database, you're not gonna like have it update every bar, every candle bar or something like that or you're using this insane amount of of compute.

Michael: 12:21

So splitting these you're saying, you know, small fast tasks I'm gonna do on this database, and then big long ones, I'm gonna I'm gonna do on this database here. See what happened to my you keep going, I'll figure out what happened here.

Dave: 12:35

Yeah. So that's a a fundamental thing you need to be able to do because back testing, you know, you're not doing that in real time. It doesn't matter. You can do that on the weekend. You can do that after hours.

Dave: 12:47

And that's where you come up with your strategies. That's where you do your optimizations. You're preparing for the real time environment, and that, in real time, it needs to be fast. You need to generate the signals fast enough to actually be able to take them. So yeah, they're fundamentally different environments, and they're separate for precisely that reason, just because you you got limited resources, you need to generate these fast, and it's just different environments.

Michael: 13:17

Well, even for real test, I do kind of the same thing where even though the the database is the same because Norgate kind of manages the database and there's much less data in it. So it's not that as big of a deal, but I still have a a file in which I run to generate the orders and then a file in which I do the back test. And the only difference is the amount of data it imports where it has to go is is a year versus thirty years of the other one. So this is this is common for, I think, not just Andy Roeger, but for everyone. Do you wanna go fast or do you wanna have really a long runway of data?

Michael: 13:53

Because you can't do both. Right? So you gotta pick one.

Dave: 13:55

Yeah. Yeah. So, so I was using this Q Collector tool. This, kind of a big GUI interface, you could say, okay, set up the symbol list, go backfill this symbol list. It would it would pull all the data down into CSV files.

Dave: 14:15

And then I would use those to generate a big old file that I would import into Amibroker. Now I did that for many years. It had a command line interface, you could initiate it automatically. There were things I didn't quite like about it. Then, so I would email their support.

Dave: 14:37

I think one time they wrote me back, but then they went dark. And it's like, okay, nobody's responding to this. They're not open to making any improvements to the software. Years and years went by and I was like, well, okay, I've still got access to it. The website's still there.

Dave: 14:53

Then the website disappeared. Now I don't have access to this tool. Then I was like, well, I've still got a local copy of it. I can still install it on a new machine. Then it stopped working with the latest version of Windows.

Dave: 15:09

So you literally can't bring the tool up anymore. So this is I was like, wow,

Michael: 15:14

okay, So this '11 not versus '10? Is that like when they were acquired? When they retired in '10?

Dave: 15:19

I think so. Yeah. So now, I'm like, wow, what do I do? And so that's when I talked to Trent about use exploring their API because I wanted to recreate this tool and just release it.

Michael: 15:38

Well, and for the people, Trent is the works with, I forget his official title, with IQ Feed, and we actually had him on the podcast. So if you guys wanna go back and listen to, you know, more of of why these data feeds can be so complex. And we did a great job of kind of looking behind the curtain of, you know, we only see the front facing thing, is the data and then that data better work, but we did a really good job of just with him understanding all this diving deep back there and saying, yeah, this is there's a lot of stuff that you don't realize that has to go on in order for you to guys to get the data that you use every day.

Dave: 16:16

Yeah. That was a fun episode. You should definitely should go back and listen to that. So I was like, okay, I can I'm going to try to recreate this tool. Now, at first I was thinking, well, I'm going to have to create this GUI.

Dave: 16:34

I don't really like creating GUIs. I'm more of a command line person. You know, I've been doing that for years. But I was thinking, well, I'll have to create a, know, a GUI for it. But then the more I started using Claude, and the more everybody else starts using Claude, the more these command line interfaces are back in vogue.

Dave: 16:57

Like, they're the cool thing now.

Michael: 16:59

Yep.

Dave: 16:59

They were cool for me back in '2 you know, 1997, but now they're cool for everybody else. This is great. So I was like, I'll just make this a command line utility and have all the documentation in the standard way so Claw can look at the help doc and see exactly how to use it. So I was like, gosh, can I like, 2026, I'm thinking about releasing a command line interface utility? That's kind of crazy, but it seems like the right move.

Michael: 17:35

Well, it's, you know, when we were talking kind of offline there, I haven't myself, I don't think opened up real tests or Amibroker in like weeks. Because everything I'm doing is is a scheduled task that the the bot's doing and it's all kind of command line. That might be a bit of an exaggeration, but I just don't I don't do it as much anymore. And I'm becoming more and more software agnostic, which is very interesting as well. Yeah.

Michael: 18:05

For example, I have these these day trading strategies that I've been running using daily bars and RealTest for a long time. They they work and they're they're great. But I wanted to do a little bit of a tweak to them for like profit targets. But with RealTest, when you only have daily bars, you can still do day trading with limit orders and things like that. But I wanna get more granular on the data.

Michael: 18:27

So I I had Claude go and grab to speed up the process is to grab the list of the names that were traded in that backtest and put them in the Amity broker and then backtest there and start to play a little bit more and more with the with the intraday data and just showing that it's the way to do it now. Is it if it's to the point where I would look at a software and be way more interested in buying it if it was run by the command line so I don't have to do it.

Dave: 18:56

That that just think how big of a change that is. I mean, you you can you imagine two years ago or when we started this podcast, could you imagine you telling me that?

Michael: 19:06

I still still I still don't know how it works. But, yeah, I it's a complete one eighty whereas before if it didn't have a GUI interface, I I started dipping my toe into real test because it was when I looked it up, it was an easier scripting language to understand. Like, there's you know, so I went from GUI to kind of easier scripting to like and now it's to the point where I don't care if you have a GUI interface. I'd like some sort of feedback, I guess, as a user to see kinda what's happening. But because the new GUI interface is just your voice.

Michael: 19:40

It's just, right, typing. That's the new GUI interface. Yeah.

Dave: 19:44

Yeah. So so I've been working on this command line utility. I've got it working in my in my pipeline and the way the the data updates for my database. So here's how it works. It's one EXE file.

Dave: 20:01

You download it. You you know, there's no install program. It's just a file. You put it where you wanna run it.

Michael: 20:11

Create a Mac folder OS, just FYI, that's how I don't know if you ever had a Mac, but that's just how they work on Mac. You don't you just open them up, and they Yeah. Works, so very app like.

Dave: 20:21

So you create a folder, put this in there. You create a folder for your data. And that's where all the CSV files are gonna be downloaded. So there's two modes for the IQ Feed version of this utility. One is the default, which is backfill, that's going to connect to IQ Feed, which is installed locally on your machine, and it pulls down all the CSV files for all the symbols that you want to download.

Dave: 20:58

And you tell it the bar size that you want, the default is one minute, you can use whatever bar size you want, including daily. And then you say, how far back do you want to Well,

Michael: 21:10

did you add that like very reluctantly when you added the daily? Did you go, oh, daily.

Dave: 21:16

No, but because I mean there are reasons you might want to use that, even in an, you know, an intraday database. You can actually insert end of day data in an intraday database in Amberbroker. So but it also is a it's a different endpoint in the IQ Feed API, so to get daily bars versus intraday bars. Alright. So so you specify how far back you wanna go, and then you just hit enter, and it runs and downloads all those CSV files onto your, into this folder.

Dave: 21:57

So you've got, you know, spy underscore one dot CSV, which means the spy one minute bars. That's what that represents. CSV file. It's just a text file. So you can go look and see what it downloaded.

Michael: 22:14

Oh, okay.

Dave: 22:16

Which it was another really important part of this process because you there's no way to go back and see when you run a backfill in Amibroker. You just can't see exactly what it's done without going through every symbol in Amibroker. Right? Like, this is just an a a much better way to audit what it's doing.

Michael: 22:35

We that that was what it stood out to me right away is because, you know, yeah, with Amibroker, you just kind of trust that it when the progress bar is done and it says that it's it's backfilled everything, you go, cool. Hope that worked. And then you just you're gonna move on with your life. So that's good.

Dave: 22:53

So alright. So now you've got a a big old list of CSV files. It's gonna be an immense amount of data. I mean, is it's a lot of data that gets downloaded. So that's the backfill phase.

Dave: 23:05

So nothing is in Amibroker yet, it's just data in your CSV files that you can go look and see what exactly would happen. It keeps the log file. Then the next phase is the import mode. Same utility, you're going to run dash dash import. It's going to and you tell it the data directory that you created that has the CSV files, and it's going to go through and create one big import file from all that data that you can import in one fell swoop into a local database in Amibroker, which is the important piece.

Dave: 23:45

You're, you have full control over what data gets in there. And you can go back and see exactly what data goes in when you do it. You have full control over that. You don't have to worry about is your, is the plugin disconnected? Is wait for backfill selected?

Dave: 24:02

And is that going to take a whole bunch of long time to complete? So you get all your data in there in one fell swoop. Now, that's really good, but where it gets really excellent is the next time you run it. You run the same commands and it automatically figures out what it already has and only downloads the new stuff. So that's the backfill part.

Dave: 24:37

It's only going to request the new stuff. So if you ran a backfill last week, last weekend, and you're running it again, it's only going to like our spy example. Yeah. You've got data up until last weekend. You imported your data.

Dave: 24:53

Now you want to update it for the data since then. You go to run the backfill command. It's not requesting data back from 2017 or whatever you did it the first time. It's all, it's going to look at the CSV file and see, okay, I've got data since the, it's going to look at the last line in that file only going to request the new stuff. So that's much, much faster.

Michael: 25:15

Yeah.

Dave: 25:16

And then when you do the import part, it knows the last thing you imported. So it's only gonna create it's gonna create a file only with the new stuff in it. So the the back when you go to import it into Amibroker, it's super fast.

Michael: 25:36

That might be a dumb question, but why why doesn't it work like that anyway? You know, it just seems like which means you're on to you built the right thing when the obvious answer is why why isn't that the way it works anyway. But I I just wonder why and I think this is how Norgate does it where it just keeps adding on to the to the database, but that just seems like the the natural solution. So why wouldn't that be the the default the way Amibroker or IQ feed or whatever they they do work that

Dave: 26:11

way anyway? It is the default in Amibroker. So when you do a back to a backfill, it is doing an incremental backfill. The problem is you can't really verify that it's complete. You run up against this 500 symbol limit because it's doing it's requesting a level one quote feed for every one of these when you don't really need it.

Dave: 26:33

And it's going up against your limit. And when it reaches that limit, your data is basically going to be corrupted. So you get out of that loop. Now, the other thing I realized as part of doing this is all right. When you set up a database in Amerenbroker now connected to IQ feed, do you remember the important number that you're putting in there?

Dave: 27:01

Like, do you remember the setting that you have to put? No. So basically you tell it how many bars

Michael: 27:10

are gonna

Dave: 27:11

be in this database.

Michael: 27:12

Yeah. And, yeah, I screwed that up a few times where I made intentionally made like a really tiny database as opposed. And it always get to regardless what you put in, it always seems to pop up with a warning due to saying this is gonna be a big database. Are you okay?

Dave: 27:26

Yeah. Right. Yeah. And there's even this registry setting that to go get more, to allow you to put a bigger number in there. Now, think about that setting.

Dave: 27:39

So here's how it works. You put in, like say, a million bars, a million one minute bars. It's gonna go back sounds like a lot, but it's not gonna go back super long, what, ten years or something? I don't remember what the number is, but it's it's some amount of time. Think about how how the IQ feed works.

Dave: 27:58

You're requesting the most recent million bars per symbol. Now what does that mean? Symbols trade a whole variety of different volumes. Right? Some don't even trade in a day.

Dave: 28:13

Yep. So what you get is in your Amibroker database, you do it this way, it happens to work, some of them like Apple or the spies going back the least amount of time, but others are going back way further. Like you've got some going back to 2007. So the start times for each symbol vary widely. So if you do a back test across all quotes, you get, you know, maybe it starts back in 2007.

Dave: 28:46

There's only a few symbols in there, but over time it gets more and more and more symbols. At some point, like all the symbols are in there, so it becomes but before that, it's kind of sporadic. It gets more and more sporadic as time goes on. This process says, okay, every symbol, back to this date, So the entire database is complete as of a specific date and time. So it's just a way better solution for this.

Michael: 29:15

The jeez, thunder. But yeah. And I never thought of that at all because you're right. There's just a handful of symbols and you can find them periodically where if you look at the chart, it's just like a couple dashes every day. Even if you're on like a five minute chart, it's just a debt because there's just one trade or two trades that happen throughout the day.

Michael: 29:36

So I didn't know that. I thought when you were requesting a time, it it would be a request of that of that block if I want certain years. But you're saying that if there is no data, it will just keep going back until it finds something or I guess there's that the symbol didn't exist. Is that kind of the

Dave: 29:53

So you're requesting a certain number of bars. Doing that across all the symbols, but what you're getting back, maybe you get a million bars for a lot of stuff, but the start date to get a million bars is different for all the symbols pretty much.

Michael: 30:10

And then you're kind of wasting that usually because when you're going into Amibroker do a test, you're setting a window anyway. So you've just loaded in a whole bunch of data that's gonna be outside, like, when I use, I've got some that is called IQ five year and IQ one year and I think IQ six months. So when I'm using IQ five year, I use the from two dates of around five years and but you're saying that there's data in that database that goes back useless data that probably goes back ten years or longer.

Dave: 30:40

Exactly. So there's so I've got a, what I call an IQ Feed Diagnostics, or a database diagnostics script that's in AFL, and it does, you can download it off the website. When you do that in your database, it'll tell you, it's basically designed just for this, like, give me a snapshot of the database, what does it look like? It'll go through every symbol, you get one line per symbol, it says, what's the first date in the database for this symbol? Are there any gaps?

Dave: 31:14

So it's a pretty interesting report, but what the first thing you notice is, wow, okay, the first date for all these symbols are completely different. Like, it's just all over the place.

Michael: 31:25

And is that just affecting storage or is that also affecting, like, processing power when I run run a backtest? Like, is it is it doing that part of it or is it just extra useless storage?

Dave: 31:39

Well, the way I think about it, it's kind of a sloppy way to set up the database because ideally you'd have one date that, you know, you can go back to and it's complete at that date. But the way it is now, you kind of pick a date that you think might be the right first date, or you just do it across all quotes and know that some symbols are going to come in in 2007. But at some point, it becomes complete, so you just sort of you can't really, you can't really rely on any decisions you make before a certain date because, you know, the date is just so scattered at that point.

Michael: 32:23

This reminds me, back old school computers. Remember when you had to run like a disk defrag periodically and your computer would get twice as fast and for you young'uns, disks used to actually spin inside a computer. And every now and then, what would happen is you would have data fragmented on different parts of the disk. So every time you want to retrieve a file, it would have to actually spin a thing and go there and then spin a thing to go back. And so you'd run this utility periodically when your computer got slow, a disk defrag, and it would just like push all the data together so it wouldn't have to have to spin as much.

Michael: 32:58

And it was like all the data was there, but it was just in such a garbage way that it just made, you know, and it was these were the things back in the day that we had to worry about when your your video game was running a little slow. You would first thing you'd bring up is the defrag tool and see if that fixed anything and made made it go a little faster.

Dave: 33:16

That that is a good analogy. I could totally see why that reminds you of that. And you're dating us here, Michael. I mean

Michael: 33:22

I was afraid to say.

Dave: 33:23

When's the last time you've already ran a defrag utility?

Michael: 33:27

Well, and it's it's funny, the save icon. I'm like, there's a whole generation of period of people that just see that as they don't know that that there's a little disc there. And that's why it was the save icon, but it's the save icon forever.

Dave: 33:41

Yeah. And you used to have to make sure you went back and clicked it. And now most stuff is just auto saved. And

Michael: 33:47

then you, yeah, you install a program and you get a stack of floppy disks and you just put in one at a at a time. But so it kinda just seems like that. Like, again, it's it's the thing that interest me is when you kind of were explaining it to me there and I'm just like, it just seems like all this should just be built in and then kind of be naturally and it's just not the way it works. So if you're like me and you didn't know that that's not the way it works, then this would be kind of a great tool to to go through and do it. And the best part of this for me is that it's all command line interface.

Michael: 34:22

So like before we got on, was talking I was talking to Dave. I'm like, it's great because I don't real I need to understand how it works because I don't ever wanna be that guy that just lets AI do everything. But I don't need to understand how to actually use it because I'm gonna get Claude to do that.

Dave: 34:40

I'm

Michael: 34:40

gonna go, okay, Claude, you go and and and figure this out for me, but this is the tool and and explain to it what you wanted to do. And then I set up a scheduled task. That's the one of the hugest things that I think has been released now. So now if it has this, you have this ability to do it and you have a scheduled task and just wake up periodically and while you're sleeping or on the weekends or find a time you're not using your computer and just take care of it for you.

Dave: 35:08

Yeah. So that's exactly how I designed it for scheduled tasks for running in a batch file automatically. You don't have to go click a button or configure a whole lot. So yeah, I mean, I think this is really, I just, I love when I create a really tight process that just works well. I really love that.

Dave: 35:35

So, yeah, I haven't quite decided how to give it away on the site yet. I've got some Mabe Kit users that are testing it and are like, Oh yeah, this is way better than we were doing before. This is the right way to do it. So the other thing I I started thinking about this. After actually our last episode, when we were talking about delisting and and and one of the traders I'm coaching saw that IQ fee doesn't split adjust their intraday bars.

Dave: 36:11

So as I got to thinking about this, was like, well, you know, as from my conversation last time, let me see if I can figure out a way to get split adjusted data. Yeah. What we talked about, it's not absolutely necessary, but you know, it's a cost benefit analysis. Maybe the, maybe the trade off now is worth doing. Like it's probably,

Michael: 36:36

Yeah. Again, if you if you missed the last episode, we just had debate about, you know, whether you should include delisted or not. And I never thought about that. I I just never because that to me is even a way more compelling argument than delisting because more of the things that you could potentially day trade are gonna be those chunky names that did reverse splits and now all of a sudden they're in play again and then before they kind of drift their way back down to zero and delist again. Was actually on a stream earlier and we found one company that if you look at the split adjusted price, the all time high was a $160,000,000,000 a share.

Michael: 37:15

And because it's done so many delistings kind of on it. So, yeah, it's it's very common for these, you know, lower float and stocks that day traders not anything that I would swing trade or or hold overnight, but they become very, very active for a short period of time because the float shrinks. So all of a sudden, something that may not have been a low float yesterday is now a float now because of these reverse splits. Yeah, didn't didn't think of

Dave: 37:39

that. Yeah. So I I've I've done a little bit of work with Polygon, now massive. I really like the idea of it. So I was like, well, let me let me just go explore and see if I can modify this utility to work with Polygon data.

Dave: 37:59

And let me just look at the splits. Like, what would it take? So I went and looked at it and there's a splits. There's a very nice splits API that I was like, okay, yeah, this is perfect. I could put a start date.

Dave: 38:13

I can put an end date. I can get all the splits. So I've got another utility now that works with Polygon. And I actually like this. I like this one better.

Dave: 38:27

I've gotta be honest. And there's a there's a very good reason why. There's there's several good reasons. So this one, there's another mode. There's the backfill mode, the default, there's the import mode, but there's with this one, there's the adjust mode and it will automatically split adjust the data perfectly.

Dave: 38:50

So, so let's say, let's say you, you know, we, we run, you backfill your data last weekend, right? Now you're running it again. There's been some splits since then. It's going to automatically figure that out, automatically adjust the splits in your CSV file that you downloaded. It's automatically gonna know that it needs to import the entire history for that specific symbol now that it's split because you gotta have to adjust the historical data.

Dave: 39:28

So when you run your import, it automatically includes all the data for the split symbols, but just the most recent data for the ones that haven't split. So it's like the perfect, the same workflow, the same process, the same automation, and you get split adjusted data. It's perfect. So I got there's one other thing I wanted to say about Polygon, but you you talk first. I wanna hear your thoughts on this.

Michael: 39:58

Well, yeah. First of all, again, it's I think that the argument for needing split adjustment data is 10 times delisted data for day traders for for the reasons I I described there. But if you can do it for Polygon, can you also do it for IQ Feed or is it just

Dave: 40:19

I've asked them. So you can get some split data, but not all split data with IQ Feed, unfortunately. And I want to talk to Trent about this because I think it's an issue. So you can get the most recent two splits. Well, one thing I noticed when I did this with Polygon, just how many freaking splits there are.

Michael: 40:48

I was about to say, and if you were if you can get the most recent two, but like I we joked about that company that was worth a $160,000,000,000 a share. If you're a company that has to reverse split, you're probably a company that has to do a lot of reverse splits.

Dave: 41:04

Yeah.

Michael: 41:04

You know, if you're a company that's doing normal splits on Apple Chipotle or you're doing once every couple years, ten years even maybe. But these garbage companies that are doing reverse splits is usually reverse split crash, reverse split crash, reverse split crash. Because now that I'm thinking about it, that's, you know, could be the birth of a trading strategy once you get that data in is you could say, okay, what happens the day of a reverse split or the day after reverse split or, you know, gaps happening because of reverse splits, things like this where yeah. So that that was my because and selfishly, was asking that because I'm like, do I do I wanna switch back to I you just pulled me away from Polygon IQ feed. But now this is interesting to me, so do I go back to back to Polygon?

Dave: 41:57

Here's

Michael: 41:58

I hope not.

Dave: 41:58

Here's another thing that I discovered, and didn't even dawn on me about the way Polygon does this. So when I first looked at Polygon for this, I tried to match the same way that it works in IQ Feed, where you go through each symbol and you you request the historical data for every symbol. And you pull the data back, put it in the CSV file. I started doing that with polygon. I was matching the same way, but I realized it was gonna take a long time to complete, Like, like over thirty hours to do that first one, which I was like, man, there's probably a better way to do this.

Dave: 42:38

Yeah. And it's because they're rate limiting you. Like, once you make a certain number of requests, they're like rate limiting you, so you can't make that many in a certain amount of time, which makes sense. I mean, they're scaling their business.

Michael: 42:50

Well, yeah, they don't want one guy to crush, crush everything for everyone else. Yeah.

Dave: 42:55

So, but the way they have these set up, they have these flat files where a flat file and is created each day that they make available to you. And it has every single minute bar for every single symbol that traded that day. So what that allows you to do is download one file. It's pretty big file, but it's one file and you can, you can see everything that traded in there. So the reason that this is it, like I said, didn't dawn on me till I started doing this.

Dave: 43:33

You, it's all, you get all the delisted stocks automatically with this process. You don't have to do anything else to get delisted data because it's capturing every trade for that particular day. When you go back in time, you're seeing, you can go back years, you're seeing a file that has every stock that traded that day. So you don't even have to give it a symbol list. You just say, go get me all the trades.

Dave: 44:02

It figures out the symbol list. It imports them automatically to Amibroker, and it's all deal the delisted data is all in there. It's all split adjusted. It's perfect.

Michael: 44:14

Sweet. Now you're gonna make me switch back. So what now you're tasked with, now that you're making me switch back, is to make yeah. To make this process insanely simple for boneheads like me in which not just getting the data down and kind of put together, but getting it up in the Amity broker and how to and how to use that. Because that that to me now I'm and it's kinda cool that, you people that have listened, like, forty four minutes into the podcast.

Michael: 44:45

This is this is the kind of thing that bursts interesting trading strategies where just a random conversation about whatever. And now I'm I'm really interested to get into, which might be completely nothing. Just playing around reverse splits. And is there a strategy to create just around reverse splits and I guess up listing and down listing. I guess you could look at things like that too.

Michael: 45:10

But that's something that it wasn't open until just now. And if you're listening how hard it is to get all of this kind of data, then it may not be an edge that is super exploited yet.

Dave: 45:26

Yeah.

Michael: 45:27

Because who has access to to that kind of thing. Right?

Dave: 45:31

Yeah. You know, I hadn't thought about that angle actually because I was just thinking about, okay, how do I get this data so it's adjusted, and it makes sense?

Michael: 45:40

Mhmm.

Dave: 45:41

But, yeah, it's keeping track of all the splits, so there's a file in there that has all the splits, it's got, you know, the ratios. And so here's what I would do. You would run a backtest like you normally would. You can have all your Mabe kit columns there. Before you run it up to the cruncher, you can merge with this splits file and say, okay, add another column for was today a split.

Dave: 46:12

And what was it? Like have another one for the ratio. Yep. And then you could, you could totally optimize on that. I did not, I hadn't thought about that, but you could totally do that with this process.

Dave: 46:22

That's cool.

Michael: 46:24

That's interesting. So yeah. Now, so now I want to go back to, and polygons cheaper too, if you don't need the real time data. So you're saving a bit there. Not that that's, I went down that road before.

Michael: 46:34

It's not worth it to to save a couple bucks. But the reason I did switch over was just the the pain of of the backfill. Now, the kind of other elephant in the room is that if you're doing it with Polygon, the other thing that we've kind of warned people against is having a different provider for real time and historical data. Is that still an issue you worry about?

Dave: 46:59

Mike, okay. I promise I did not tee up this question for my client. I did not tell him this before this. But just this morning, I saw I've been working on an Amber Broker plug in for Polygon. And just this morning, I got it working in like a very alpha test, but I've got it working.

Dave: 47:21

And, right now, I'm gonna sort of stress test it, see how many symbols I can add. But yeah, I've I've got that working now, so that solves that problem. And I kind of I kind of like I mean, the one thing about it is you don't have to install any software. You don't have to have a data manager running. It's all online.

Dave: 47:42

So I I'm I'm liking where this is going. And, like I said, I promise I did not tee that up for Michael.

Michael: 47:51

And we are not sponsored. If you do wanna, you know, back up a truck worth cash, maybe, but, yeah, this is all just. And it's funny because this is how you know, I can't stress this enough that most traders out there are not thinking about stuff like this. Most traders out there that you see online or that you talk to or whatever, they're just worried. They're they're scanning through charts to try to figure out the next symbol that the the next stock they wanna buy.

Michael: 48:17

And that's what they're spending all of their day doing. And looking back on how much I did that throughout my career versus what I'm doing now, it just seems like this is just such a such a more important question that will yield way more future results because who cares? I I don't I don't know what stocks I bought today. I think I've taken, like, ten day trades today. I have no idea what any of them are or what they do.

Michael: 48:45

Because you're starting to think about stuff like this instead. You're not thinking about about that day to day. So, you know, I guess we'll cycle back when you get real time data, maybe. In the meantime, I might be going back and put put my credit card back in Polygon for a bit.

Dave: 49:02

Yeah. I mean, I'm the whole exercise has been pretty eye opening for me. One thing I wanna do is like I said, I was shocked how many splits there were, and I was shocked how many crazy ratios are out there that have happened. I mean, I could not believe it. So I'm gonna post something to my mailing list soon, just with some summary information, like how many symbols have split over that period.

Dave: 49:26

It's a huge number, like way more than you would guess. And just some interesting statistics that I've come across about, you know, what's the most common split, ratio? What's the most common reverse split ratios? How often do the reverse splits happen? Like, there's all sorts of stuff that I can think of to that I think would be interesting to look at just because, you know, that's the other thing this utility does.

Dave: 49:53

It writes a log file, you can go see exactly what it does. You can go back and piece stuff together. So, yeah, I'm deep into this, and while I'm in here, I'm gonna create some stats, some summary stats for people to look at, I think would be interesting.

Michael: 50:10

Cool. Well, yeah, that's awesome. So, listen, you definitely, if you're listening to this fifty minutes in, you definitely are a nerd. Congratulations. If you didn't know before today, you just listen to a fifty minute podcast about data backfilling for systems training.

Michael: 50:29

So Gosh.

Dave: 50:30

We're just getting started. We could go two or three times as long, I think.

Michael: 50:34

Good. Well, once we get, let's we'll circle back and and see how this this update goes because I'll be definitely diving into it, you know, this weekend and things like that. But as always, I'm Michael Nauss.

Dave: 50:47

And I'm Dave Mabe. Talk to you next week on Line Your Own Pockets.

Creators and Guests

Host

Dave Mabe

Host

Michael Nauss

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere