Jump to content

php_help

New Members
  • Posts

    4
  • Joined

  • Last visited

    Never

Profile Information

  • Gender
    Not Telling

php_help's Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. So, the conversion between forums is not necessarily as advertised (or not mentioned)? The IDs for posts or topics didn't get converted as far as I can tell. Could be that the tables are entirely different between the forum softwares.
  2. For example this link http://www.phpfreaks.com/forums/index.php?showtopic=32746&hl= from http://www.phpfreaks.com/forums/index.php/topic,31047.0.html does not work -- goes to forum index
  3. I tried browsing the links to posts referenced. But, it only takes me to the forum index page. Probably other forums also have this problem. I think the errors are due to mySQL auto increment numbers being reset or have gone out sync for some reason. How do you generally deal with the problem....that is auto increment numbers for topics or posts getting out of sync?
  4. I hope you have found a solution to your data problem. I have similar issue myself. What you are looking for is not php or web application specific. It has to do with database and data quality. You would need to clean data before using. A good database design would help in future data quality issues. Cleaning Data: You can use data integration or manual cleaning to clean your data depending on how much cleaning and scrubbing is involved. Some data integration tools have fuzzy search components/modules to handle this type of issue. You would most likely need a look up table or more. Once your data is all cleaned and used in your application, you would need to ensure the data sources also follow certain rules when inputting data. Or using only single point of data entry. If you are not going to import data from these sources again, then you probably need not have to worry about this. To clean the data, I would follow something like this: 1.) Make a look up table for the data -- the table would contain data in correct format. 2.) You would also add several columns to this table that hold aliases: alias1, alias2,.... 3.) For many situations using this de-normalized table structure might be OK. 4.) Cleaning has to be done by comparing each row from the table you need cleaned. Comparing each row with the look up table (using the main column, or aliases), you would need filter the rows into 3 outputs -- one for exact matches, another for similar but not quiet (say the word order is different, or abbreviated words, etc.), and still another for non-matches. 5.) This should greatly reduce the number of rows that need cleaning. Manually clean the data in two outputs -- partial matches and non-matches.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.