Jump to content

multiple averages rolling over the same numbers


Recommended Posts

I am writing a stock program and I am going to be getting several averages on volume over different periods, 10, 21, 50,100,120,150,200,240 are my averages. 

 

I would like to make this fairly efficient, seeing as how its going to be a ton of data, I have over 7700 stocks in the database and they will each have all of those averages. 

 

My question is how to do this that is the most efficient, I thought about taking the 10 day then adding 11 to it, then adding 29 to get the 50 day and so on and so forth, or is that too much work and I should just do each average individually? 

select avg(volume) from _AAPL where date between ‘2014-02-03’ and ‘2014-05-05’;

The above is my current very generic code that does work, now I just need to make a php script, which I can do, I am just curious about how different people would approach this.

 

Thanks for the help.

Edited by illcrx

Off the top of my head,

SELECT
	AVG(IF(/* date in range 10 */, volume, NULL)) AS range10,
	AVG(IF(/* date in range 21 */, volume, NULL)) AS range21,
	AVG(IF(/* date in range 50 */, volume, NULL)) AS range50,
	(etc)
FROM _AAPL
WHERE /* date within the widest range you're counting */
Because AVG() will completely ignore NULL values.

 

 

But... are you using a table for each stock quote? That's not good.

Edited by requinix

I need to hold _AAPL data by date, every day so  I have the dates as the columns, there are many many variables that are going to go in each row so thats what I came up with. I already have a script to go over every stock so that its not labor intensive getting all the data from all of the stocks. 

 

Am I missing something is there a better way to do that? 

Edited by illcrx

I'm positive there is.

 

Can you post the CREATE TABLE statement for the table? You can use a SHOW CREATE TABLE _AAPL to get it. Maybe with a couple rows of sample data to see how it's being used? It'd also help to know some of the kinds of queries you're running against it.

    -> +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    -> | _AAPL | CREATE TABLE `_AAPL` (
    ->   `date` date DEFAULT NULL,
    ->   `open` decimal(16,4) DEFAULT NULL,
    ->   `high` decimal(16,4) DEFAULT NULL,
    ->   `low` decimal(16,4) DEFAULT NULL,
    ->   `close` decimal(16,4) DEFAULT NULL,
    ->   `volume` double(15,0) DEFAULT NULL
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
    -> +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I have 27 more rows to put in here, the dates are the columns, I will be querying them to do averages, comparing to those averages and using them to output signals. So I will be doing a lot with it. I am new to databases and its a fun experience, but I have a lot to learn! Obviously which is why there is no primary key or any other more definitive characteristics about the data. I am still in very early stage. 

| date       | open     | high     | low      | close    | volume    |
+------------+----------+----------+----------+----------+-----------+
| 2014-01-01 |  80.1457 |  80.1457 |  80.1457 |  80.1457 |         0 |
| 2014-01-02 |  79.3829 |  79.5757 |  78.8600 |  79.0186 |   8398800 |
| 2014-01-03 |  78.9800 |  79.1000 |  77.2043 |  77.2829 |  14036100 |
| 2014-01-06 |  76.7786 |  78.1143 |  76.2286 |  77.7043 |  14765500 |
| 2014-01-07 |  77.7600 |  77.9943 |  76.8457 |  77.1486 |  11347500 |
| 2014-01-08 |  76.9729 |  77.9371 |  76.9557 |  77.6371 |   9240900 |
| 2014-01-09 |  78.1143 |  78.1229 |  76.4786 |  76.6457 |   9986400 |

So next to volume I have 

ALTER TABLE stocks ADD (day tinyint(1), 
10day int,     //these are averages of prices over x days
21day int, 
50day int, 
100day int, 
120day int, 
150day int, 
200day int,
240day int, 
20day_vol int, 
50day_vol int,
 pp_limit int,                               PP are volume signatures
pp_day tinyint(1), 
volPercentAvg double (5,2), 
quiet_volume tinyint(1),             signal derrived in php
up_day tinyint(1), 
down_day tinyint(1), 
percent_close double (5,2), 
percent_in_day double (5,2),     will tell where the close was in relation to the high/low in percentage
support_day tinyint(1),               derrived from above variable 
8dayPP_limit int,                         
8dayPP_day tinyint(1), 
distribution_day tinyint(1),          
50day_test tinyint(1),                  test of the 50ma
50day_break tinyint(1),               
continuationPP tinyint(1));           PP based on things. 
So as you can see I have a lot going on, I thought about this from the bottom up considering there are so many stocks and I need to do so much with each of them. So my architecture goes like this
 
DB= Stocks_Daily 
Table=_AAPL or appropriate stock symbol
column= dates
rows = data for each date
 
Every day all of this will be derrived and updated, 
 
I dont even have architecture yet for the real time, this is all historical. But it should be pretty similar I would guess. 
 
Thanks for checking it out! 
Edited by illcrx

That's 7700 tables isn't it. What you should have is

-> CREATE TABLE `stock_price` (
    ->   `stock` VARCHAR(6) DEFAULT NULL,
    ->   `date` date DEFAULT NULL,
    ->   `open` decimal(16,4) DEFAULT NULL,
    ->   `high` decimal(16,4) DEFAULT NULL,
    ->   `low` decimal(16,4) DEFAULT NULL,
    ->   `close` decimal(16,4) DEFAULT NULL,
    ->   `volume` double(15,0) DEFAULT NULL,
    ->   Primary key(stock, date)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1

where the added stock column contains the "AAPL" etc.

 

One of the rules when constructing a db is "no derived data" so all those averages, percentages etc are not required. You calculate those in your queries when required.

 

edit: if you have a separate "stock" table with details of the quoted stock (exchange, industry type etc) then the stock column above should be the id of the "stock" record.

Edited by Barand

That's 7700 tables isn't it. What you should have is

-> CREATE TABLE `stock_price` (
    ->   `stock` VARCHAR(6) DEFAULT NULL,
    ->   `date` date DEFAULT NULL,
    ->   `open` decimal(16,4) DEFAULT NULL,
    ->   `high` decimal(16,4) DEFAULT NULL,
    ->   `low` decimal(16,4) DEFAULT NULL,
    ->   `close` decimal(16,4) DEFAULT NULL,
    ->   `volume` double(15,0) DEFAULT NULL,
    ->   Primary key(stock, date)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1

where the added stock column contains the "AAPL" etc.

 

One of the rules when constructing a db is "no derived data" so all those averages, percentages etc are not required. You calculate those in your queries when required.

 

edit: if you have a separate "stock" table with details of the quoted stock (exchange, industry type etc) then the stock column above should be the id of the "stock" record.

So under this solution instead of 7700 tables with thousands of columns, one for each date, I have one supermassive table? 

Wouldnt the symbols be duplicated for every single day? Is that more efficient or would that just take up a bunch of room duplicating data? 

With a "stock" table, only the id of the stock record would be duplicated. That's how relational databases work - the records are related by the keys and foreign keys.

 

What is the highest priced stock?

Which stock prices moved most in a particular time period?

 

How would you query those with 7700 tables?

Right now I have the columns as the dates and would just query by the date and the table as the $symbol

mysql_query ("update _$symbol set volPercentAVG=$volPercentAVG WHERE date=$todaysdate;");
mysql_query ("update _$symbol set continuationPP=$continuationPP WHERE date=$todaysdate;");

 That is what I have now, the _$symbol is the table with $condinuationPP or whatever column as the place Im entering data. 

Edited by illcrx
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.