mongoose00318 Posted June 29, 2020 Share Posted June 29, 2020 I am trying to wrap a string with an anchor tag if it finds a match with a block of text. Here is an example text block: CONOCO 1'10x8 VC DF TP SGN||PRINCIPAL ILLUMINATION||ENG: CO3028TP_0VPR||DWG: CO200428||TO BE: DYED DIESEL (SPEC)|| The string I would want to wrap with a link would be "CO200428". The next problem is the drawings (what I'm searching the text for) has over 115,000 possibilities. The text blocks to search are over 1300. I have the drawing names stored in a simple mysql table...but doing a foreach takes forever...I imagine it will take even longer when looking in the text blocks... Is there a way to easily to do the anchor wrap? I don't know regex very well... Quote Link to comment Share on other sites More sharing options...
gw1500se Posted June 29, 2020 Share Posted June 29, 2020 if (preg_match('CO200428', $a)) { echo "<a href=\"#anchor\">$a</a>" } Or something like that. Quote Link to comment Share on other sites More sharing options...
Barand Posted June 29, 2020 Share Posted June 29, 2020 "sans regex" method <?php $str = "CONOCO 1'10x8 VC DF TP SGN||PRINCIPAL ILLUMINATION||ENG: CO3028TP_0VPR||DWG: CO200428||TO BE: DYED DIESEL (SPEC)||"; $result = ''; $link = 'DWG: <a href="http://www.domain.com?drawing=#">#</a>'; $p1 = $p2 = 0; while (($p1 = strpos($str, 'DWG:', $p2)) !== false) { $result .= substr($str, $p2, $p1 - $p2); $p2 = strpos($str, '||', $p1+4); $drwg = trim(substr($str, $p1+4, $p2 - ($p1 + 4))); $result .= str_replace('#', $drwg, $link); } $result .= substr($str, $p2); echo $str . '<br>' . $result; ?> Quote Link to comment Share on other sites More sharing options...
mongoose00318 Posted June 29, 2020 Author Share Posted June 29, 2020 @Barand Yes! I'd love to do that...but...sadly when it comes to the data entry side there are no standards that are followed. I'm always having to do things backwards and goofy due to how data is stored. I've got a solution I'm working on which I'd like to hear your feedback on...just need some time to finish it. Probably will have it finished some time tomorrow. I actually got by using the strpos and str_replace functions...I look forward to your feedback. When I update I'll provide some more details as to the goofy problems I've had to work around to accomplish this (from bad data to network issues) Quote Link to comment Share on other sites More sharing options...
mongoose00318 Posted June 30, 2020 Author Share Posted June 30, 2020 Okay so here my code...it takes like 20 minutes to finish executing though. It would be great if I could make it faster. <?php //start timer $rustart = getrusage(); //allow more memory and execution time for script ini_set('memory_limit', '4096M'); ini_set('max_execution_time', '60000'); //include dbconfig include('scripts/dbconfig.php'); //run functions in sequence delete_drawings(); get_drawing_list($pdo); cleanup_drawing_list($pdo); get_drawings($pdo); function get_drawing_list($pdo) { $path = '\\\directoryTo\PDF_Files'; $dirs = array(); // directory handle $dir = dir($path); $query = "TRUNCATE TABLE drawings_pdf_files"; $statement = $pdo->prepare($query); $statement->execute(); while (false !== ($entry = $dir->read())) { if ($entry != '.' && $entry != '..') { if (is_dir($path . '/' .$entry)) { $query = "INSERT INTO drawings_pdf_files(fileName) VALUES(?)"; $statement = $pdo->prepare($query); $statement->execute([ $entry, ]); } } } } function cleanup_drawing_list($pdo) { $query = "SELECT * FROM drawings_pdf_files"; $statement = $pdo->prepare($query); $statement->execute(); $drawings = $statement->fetchAll(); $query = "SELECT description FROM production_data"; $statement = $pdo->prepare($query); $statement->execute(); $descriptions = $statement->fetchAll(); foreach ($drawings as $drw) { //count matches $count = 0; //compare for a match foreach ($descriptions as $desc) { if (strpos($desc['description'], $drw['fileName'])) { $count++; } } //remove the record if it has no matches if ( $count == 0 ) { $query = "DELETE FROM drawings_pdf_files WHERE id = :id"; $statement = $pdo->prepare($query); $statement->execute([ 'id' => $drw['id'], ]); } } } function get_drawings($pdo) { $query = "SELECT * FROM drawings_pdf_files"; $statement = $pdo->prepare($query); $statement->execute(); $drawings = $statement->fetchAll(); foreach($drawings as $drw) { $dir_pdf = "\\\directoryTo\PDF_Files\\"; $dir_art = $drw['fileName']; $dest_pdf = "drawings/"; //create new directory for the files mkdir($dest_pdf.$dir_art); //copy each file in that folder $list_files = array_diff(scandir($dir_pdf.$dir_art), array('..', '.')); foreach($list_files as $f) { copy( $dir_pdf . $drw['fileName'] . '\\' . $f, $dest_pdf . $drw['fileName'] . '/' . $f); } } } function delete_drawings() { $dir = "drawings/"; $di = new RecursiveDirectoryIterator($dir, FilesystemIterator::SKIP_DOTS); $ri = new RecursiveIteratorIterator($di, RecursiveIteratorIterator::CHILD_FIRST); foreach ( $ri as $file ) { $file->isDir() ? rmdir($file) : unlink($file); } return true; } //output usage statistics //$ru = getrusage(); //echo "This process used " . rutime($ru, $rustart, "utime") . " ms for its computations\nIt spent " . rutime($ru, $rustart, "stime") . " ms in system calls\n"; ?> The way that the drawing gets put into the description varies: CONOCO 1'10x8 VC DF TP SGN||PRINCIPAL ILLUMINATION||ENG: CO3028TP_0VPR||DWG: CO200428||TO BE: DYED DIESEL (SPEC)|| So I can't look for "DWG:" Sometimes they don't put anything in the description telling you its the drawing. People just figure it out from experience. So what I've done is pull in all of the files from the drawing folder, put it into a table, then compare each drawing with each description, remove any drawings that dont have matches, and then copy the drawing files that do have matches. @BarandWhat do you think? Quote Link to comment Share on other sites More sharing options...
gw1500se Posted June 30, 2020 Share Posted June 30, 2020 (edited) One thing I see off the top of my head is to not delete one row at a time. Build a query string that deletes everything you want then execute that query once. This is really a MySQL question and I am not experienced enough to give you the exact syntax but it involves using 'WHERE id IN (...)' Edited June 30, 2020 by gw1500se Quote Link to comment Share on other sites More sharing options...
Barand Posted June 30, 2020 Share Posted June 30, 2020 1 hour ago, mongoose00318 said: What do you think? I think I'd prefer to see some of the data you are dealing with first. What does the data in drawings_pdf_files and production_data tables look like (few sample records)? What is the structure of production data table? Quote Link to comment Share on other sites More sharing options...
mongoose00318 Posted June 30, 2020 Author Share Posted June 30, 2020 (edited) Here is sample data from 'drawings_pef_files" Sample data from 'production_data' Table structure -- -- Table structure for table `production_data` -- DROP TABLE IF EXISTS `production_data`; CREATE TABLE IF NOT EXISTS `production_data` ( `id` int(11) NOT NULL AUTO_INCREMENT, `job_number` int(8) NOT NULL, `enterprise` tinytext NOT NULL, `part_number` text NOT NULL, `description` text NOT NULL, `psm` tinytext NOT NULL, `qty` int(11) NOT NULL, `line_item` varchar(11) NOT NULL, `as400_ship_date` date DEFAULT NULL, `hold_reason` text NOT NULL DEFAULT '0', `hold_date` date DEFAULT NULL, `insert_time` datetime NOT NULL COMMENT 'time order was inserted or last updated', PRIMARY KEY (`id`), KEY `job_line` (`job_number`,`line_item`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; COMMIT; In the description field often times there is no DWG: prefixing the drawing number. Edited June 30, 2020 by mongoose00318 Quote Link to comment Share on other sites More sharing options...
gw1500se Posted June 30, 2020 Share Posted June 30, 2020 Just so we understand. If the filename from drawings_pdf_files is not in any description field of production_data, you want to delete that row from drawings_pdf_files? Quote Link to comment Share on other sites More sharing options...
mongoose00318 Posted June 30, 2020 Author Share Posted June 30, 2020 Yes Quote Link to comment Share on other sites More sharing options...
Barand Posted June 30, 2020 Share Posted June 30, 2020 Is this faster? $db->exec("DROP TABLE IF EXISTS drawings"); $db->exec("CREATE TABLE drawings (drawing varchar(20) not null PRIMARY KEY)"); $files = glob('files/*.*'); foreach ($files as $f) { $names[] = "('" . pathinfo($f, PATHINFO_FILENAME) . "')"; } // populate the drawings table $db->exec("INSERT IGNORE INTO drawings (drawing) VALUES " . join(',', $names)); // now delete those drawings NOT found in production table $db->exec("DELETE d FROM drawings d LEFT JOIN production_data p ON locate(d.drawing, p.description) WHERE p.description IS NULL; "); Quote Link to comment Share on other sites More sharing options...
mongoose00318 Posted June 30, 2020 Author Share Posted June 30, 2020 I get this error: Fatal error: Uncaught Error: Call to a member function exec() on string in All I did was change the glob('files/*.*'); to the correct path. Quote Link to comment Share on other sites More sharing options...
Barand Posted June 30, 2020 Share Posted June 30, 2020 (edited) $db should be a valid PDO connection. Change to $pdo if that's yours. Edited June 30, 2020 by Barand Quote Link to comment Share on other sites More sharing options...
mongoose00318 Posted June 30, 2020 Author Share Posted June 30, 2020 @Barand Hmm weird..it's doing something. But when I dump the $names array it only has 76 values. It should have over 115,000? Array ( [0] => ('(1)') [1] => ('1 ART CHECK LIST.CH201192') [2] => ('AS180551') [3] => ('Art (jacksonville-dc) (Z) - Shortcut (2)') [4] => ('Art (jacksonville-dc) (Z) - Shortcut (3)') [5] => ('Art (jacksonville-dc) (Z) - Shortcut') [6] => ('As173218') [7] => ('BK1144SF_LDQs Model (1)') [8] => ('CIT00028') [9] => ('CP061036_6248_NI') [10] => ('CP064096_6948SP1_SBLED') [11] => ('CT174631.E') [12] => ('CT2065FM_2LD') [13] => ('Co0771939') [14] => ('Customer Status System') [15] => ('Di202434') [16] => ('EN3038SF_Qs') [17] => ('Ex135513') [18] => ('Ex181015') [19] => ('Ex183450') [20] => ('FMD6SLR-80000_Rev_B') [21] => ('FileHandler') [22] => ('GR176558') [23] => ('HiFashion') [24] => ('MA9018CL_BKT') [25] => ('MFG (jacksonville-dc) (W) - Shortcut (3)') [26] => ('MFG (jacksonville-dc) (W) - Shortcut') [27] => ('MFG (jacksonville-dc) (Y) - Shortcut') [28] => ('MV-14 4-Product Pricer') [29] => ('M_TB1110CS_0IDMB') [30] => ('OR1618CP_FXGEHWSP') [31] => ('OR163231.E') [32] => ('PS151090.E') [33] => ('Plantation Market-MaryvilleTN') [34] => ('Ps185117') [35] => ('RVI_E Order Form - Dispenser ..') [36] => ('SFederal He18021413451') [37] => ('SG0068CP_ID3w211EMCSP_SBLED') [38] => ('SG0068RF_2L2VSP') [39] => ('SG0107RF_3w411EX_typA') [40] => ('SG133474_e') [41] => ('SG160531_e') [42] => ('SG167928_e') [43] => ('SG200030_e') [44] => ('SG2055TP_3w111SP') [45] => ('SG2056TP_3w211SP_SBLED Model (1)') [46] => ('SG2077FM_3w211SP_TYP_A') [47] => ('SG2310RF_1LDSF') [48] => ('SG3015TP_TX_SBLED') [49] => ('SH002119') [50] => ('SH103201 - Shortcut') [51] => ('SH141792') [52] => ('SHELL 609 ALLISON BONNET DR BESSEMER AL 85125') [53] => ('SPECIALT') [54] => ('Sd201942_DIE') [55] => ('Shortcut to Art & Engineering Request.mdb') [56] => ('Shortcut to Enterprise') [57] => ('Specialty Contracting Agreement - Rosedale Hwy') [58] => ('Synergy Survey Calculator - Eleventh Draft2') [59] => ('Thumbs') [60] => ('XX174610XX') [61] => ('Xx161880.E') [62] => ('Xx163854') [63] => ('Xx185178.DP') [64] => ('Xx4024NI_0IDGM') [65] => ('ci164947') [66] => ('ci167010') [67] => ('ck145891_E') [68] => ('export_1') [69] => ('mfg (Jacksonville-dc) (V) - Shortcut (2)') [70] => ('mfg (Jacksonville-dc) (V) - Shortcut (3)') [71] => ('mfg (Jacksonville-dc) (V) - Shortcut (4)') [72] => ('mfg (Jacksonville-dc) (V) - Shortcut (5)') [73] => ('mfg (Jacksonville-dc) (V) - Shortcut') [74] => ('mfg (Jacksonville-dc) (W) - Shortcut (2)') [75] => ('su2410tp') [76] => ('test') ) Quote Link to comment Share on other sites More sharing options...
Barand Posted June 30, 2020 Share Posted June 30, 2020 Are you sure you have correct path - those names don't have much in common with the sample filenames that you posted earlier. Given the volume, it will better to chunk the $names array and add 1000 records per query. // populate the drawings table $chunks = array_chunk($names, 1000); foreach ($chunks as $ch) { $db->exec("INSERT IGNORE INTO drawings (drawing) VALUES " . join(',', $ch)); } Quote Link to comment Share on other sites More sharing options...
mongoose00318 Posted July 10, 2020 Author Share Posted July 10, 2020 @Barand Sorry I haven't been back to this task yet but I will be back to it hopefully next week sometime and I will provide an update then. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.