Jump to content

modify and move table ...


dd_gamer

Recommended Posts

I'm hoping someone can help me or at least put me in the right direction. I have a Mysql database and  one table with over 800,000 records. Most of the other tables are manageable. What I need to do is move a selected number of records (date range) from that one database table and make another table so I can improved my search capabilities within a report query.

The majority of the 800,000 records are for reference only and only the active records need to be searched (approx. 20,000). I need all the records (800,000) for report history but not for the active query and the table will keep growing.

 

Not sure if I should move table to another database!

 

It would be best if I could automate a the process....

 

 

Link to comment
Share on other sites

Using partitioning and specifically partition pruning might help - http://dev.mysql.com/doc/refman/5.1/en/partitioning-pruning.html

 

By setting up a partition for the current/active records and one (or more) for the inactive records and using a WHERE clause which would specifically identify that a query would find data only in the partition for the current/active records, your query will only search that partition. However, the entire set of records will still be available in the table.

Link to comment
Share on other sites

Never seen partitioning before.  That's very useful to know and would work well in a stored routine which is triggered perhaps? Or like I said before, through a oho script which is set off by a cron job.

Link to comment
Share on other sites

You should really think about moving the old records to an archive table, not moving newer records to a new table. This could be done with a php script set on a cron.

 

Can you give an example of php code? I've never used cron and not sure if I have access...

Link to comment
Share on other sites

if partitioning as PFMAbismad suggested is not an option for you (most likely it should), then could you explain exactly why you are worry about this "records separation"?; 800,000 records is nothing... are your queries slow? (show them if that is the case)... assuming that you have the right indexes in your table, you shouldn't have problems with it if your queries are written correctly ... what define is your records are active or not?

 

if you provide more information about your issue and goals people here could give you more targeted/accurate advice

Link to comment
Share on other sites

mikosiko

 

The query only takes approx. 2 seconds and was improved from 90 seconds on another post a few weeks ago. I have to run that query over 600 times (one per person) during the running of the report. I don't need the other (800,000) records during the report running and the table will be growing at a rate of 250,000 every 3 months or so.

 

I hope this helps to explain my issue.

Link to comment
Share on other sites

run that query over 600 times (one per person)

 

Nooooooooo..... You would use one query that gets the rows you want in the order that you want them. You would likely need to use GROUP BY user_id to consolidate the rows for each user.

 

If you post your actual queries and code showing what you are trying to do, someone can probably help.

Link to comment
Share on other sites

post your current query and the relevant code that use it here ... are you running that query in a loop?

 

I don't have the code to get all the students yet! The code will look for "all students" that are active and run... If you look at the link you will see the code for one student only. If you do the math (600 x 2 seconds) the query will take to long and as time goes on the query will take longer and longer as the table keeps growing.

 

That's why I'm asking for other methods , idea's and the possibilities.

Link to comment
Share on other sites

Just to clear up any query issues ... here is the query:

 

SELECT DISTINCT DATE(studentrecord_daily.record_date),
studentrecord_daily.student_id,
schedule_room.scheduledate,
schedule_room.scheduleid, 
schedules.id,
schedules.gf, 
schedules.schedule_desc 

FROM schedule_room

JOIN schedules 
ON schedules.id = schedule_room.scheduleid
JOIN studentrecord_daily 
ON DATE(studentrecord_daily.record_date ) = schedule_room.scheduledate

WHERE studentrecord_daily.student_id = '3'
AND studentrecord_daily.record_date >= '2011-09-06'
AND studentrecord_daily.record_date <= '2011-11-04'

 

And here is the loop:

 

do {


$factorSum =$row_rs_factor['schedules.gf'];

$factorTotal = $factorTotal + $factorSum ;

} while ($row_rs_factor = mysql_fetch_assoc($rs_factor));

 

Then upload:

 

mysql_query("UPDATE tcstudent SET data = '$factorTotal' WHERE id = $student_id ");

Link to comment
Share on other sites

a) The query for all students (or a specific subset of students) won't take 600 times the time it takes for one student. Have you tired it without the student_id in the where condition? The query is not being executed once for each student, it is being executed once for all the rows that the query matches.

 

b) "And here is the loop:" Are you aware that you can use aggregate functions in the query, i.e. SUM(), to get the query to return the value you want for each student (provided you use GROUP BY student_id in the query.)

 

 

 

 

Link to comment
Share on other sites

b) "And here is the loop:" Are you aware that you can use aggregate functions in the query, i.e. SUM(), to get the query to return the value you want for each student (provided you use GROUP BY student_id in the query.)

 

Just to reinforce this suggestion. Aggregation is, as far as I know, the best way to handle a lot of queries. There's a method known as pre-aggregation which can be used to maximize efficiency of your queries however, I'm not convinced there's much point in going into such detail here.

 

But SUM() and GROUP BY are definitely worth looking into as already stated!

Link to comment
Share on other sites

I changed the

schedules.gf

to

SUM(schedules.gf)

but the aggregate counts every value within the date range and doesn't seem to recognize the "DISTINCT" in my code

DISTINCT DATE(studentrecord_daily.record_date)

so the count is too high (not correct). Also when I removed the student_id and use it in the Group by the query "error" time -out"

Link to comment
Share on other sites

I made the changes to my code using the "Group by" with the student_id and that makes the query run "all" students (600 plus). The server timed out after 2 minutes (see above)... So, back to my original question about table reduction methods.  Does anyone have any idea's? I'm stuck on this at this time !

Link to comment
Share on other sites

There's nothing to show in your code. After you partition the data table, the WHERE clause in a query tells the database engine to only access the partition(s) that match the WHERE condition. If you are doing this based on dates, your where clause would included a date comparison that only matches the partition holding the records that you want to query.

Link to comment
Share on other sites

  • 1 month later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.