How can I build up a database of US securities financial data?

meisel

New Member
Hi, I'm looking to create a database of basic financial data for all US securities (or at least, US stocks). It should have at least 5 years of continuous recent data (e.g., from start of 2018 to start of 2023) which for all tickers should include daily data for all trading days (open/close price, weighted shares outstanding, volume, etc.) and quarterly filing data (earnings-per-share, date of quarterly report filing, etc.). I'm using polygon.io 's API for this right now, but it has major limitations (e.g. to get all the necessary daily data, I'd have to send a request for each ticker/date tuple, and there are millions of them). I can spend up to $2,000 on this, and am happy to pull using repeated API requests or to just purchase a database. Any recommendations?
 
You can use PostgreSQL, which is free and widely used at the enterprise level, to store stock quotes. However, you will need a substantial amount of money to cover the storage space required for all this data. Take that into account, how much you will have to spend on a recurrent basis to maintain your system.

When hiring a programmer, you need to be specific. This means that if you request a database, you will receive just the database, without a user interface to retrieve data from it, forms, or spreadsheets – only the database and the synchronization tool. Among other details, you will need to provide the type and frequency of data (tick, ohlc, h1, d1???), for instance.

If you require anything beyond a database being fed with data from your API, you will need to provide an explanation, otherwise you will receive only the db.

In my humble opinion, $2,000 is quite low for an IT project; it's equivalent to 2-5 days of work for a professional engineer. Your project, however, might take a minimum of 2-3 weeks, which translates to a minimum cost of approximately $10,000 for a reputable IT company.

You may want to explore websites where you can request estimates, such as Freelancer or others, as you may receive cheaper quotes that align with your budget and expected outcomes. There you will find individuals willing to work for lower fees that will send you proposals.

A tip about PostgresSQL. In case someone recommends you using the PostgreSQL extension Timescale (specific for indexing date-time based records), you should be aware that it is a nightmare for maintaining it as it creates hypertables that hide the actual data layer, generating a lot of drama and pain for the IT teams.

Cheers.
 
Last edited:
I am a programmer, so I'm happy doing all the IT and programming work here. Picking a database etc. is straightforward for me. The $2,000 need only go towards paying for API access or buying database files. My question is just, *which* API or database should I use? polygon.io is a good example of the sort of thing I'm looking for, but as I mentioned, requires too many API calls to get things like daily weighted shares outstanding. Your suggestion of using an exchange directly is interesting, although I'm curious how many exchanges I would need to go to to get a fairly comprehensive list of US securities, if there's any data cleaning that would be required, and how much work of these would be to ingest (do they too have issues with needing many API calls for the data I want?). In other words, I'm curious how much work that would be and if there's a third-party data provider whose aggregated data from the exchanges is easier to consume. I do only need fairly basic data, with rows once per day per ticker for daily data and once per quarter per ticker for quarterly reports.
 
  • Like
Reactions: gon
Hi, Meisel.

Sorry for the confusion, my mistake.


For EOD data I think there may be a lot of free resources. I would investigate further.


See this for instance. They or any of their associates I am sure they will be able to provide free EOD quotes on daily data.


Then, for intance (it is just I do not trade stocks and dont know where the data is offered), maybe using Tradestation (IB Brokers) or something like that you can fetch directly, with C# or through sockets, what you need.

See also this (I googled end of day stock data):

One thing that is important is the timeframe and the daily open and close hours. Be careful with it. Using LTF (H1) would allow you to resample the data and shift back and forward the open time for instance, as needed, etc...

I cannot help too much further.

Regards,

Gonzalo.
 
Thanks for those suggestions. I'm curious if there are specific data sets that are known to be particularly clean, as I've heard that data cleanliness is a major issue. In fact I see that with polygon.io, there are a number of companies with multiple quarterly reports for a given quarter, with substantially different numbers between the reports. So I want a data set that ideally doesn't have that issue and others.
 
Back
Top