Jump to content

Looking for a string within some text and wrapping it with a link


mongoose00318

Recommended Posts

I am trying to wrap a string with an anchor tag if it finds a match with a block of text. Here is an example text block:

CONOCO 1'10x8 VC DF TP SGN||PRINCIPAL ILLUMINATION||ENG: CO3028TP_0VPR||DWG: CO200428||TO BE: DYED DIESEL (SPEC)||

The string I would want to wrap with a link would be "CO200428". 

The next problem is the drawings (what I'm searching the text for) has over 115,000 possibilities. The text blocks to search are over 1300.

I have the drawing names stored in a simple mysql table...but doing a foreach takes forever...I imagine it will take even longer when looking in the text blocks...

Is there a way to easily to do the anchor wrap? I don't know regex very well...

Link to comment
Share on other sites

"sans regex" method

<?php
$str = "CONOCO 1'10x8 VC DF TP SGN||PRINCIPAL ILLUMINATION||ENG: CO3028TP_0VPR||DWG: CO200428||TO BE: DYED DIESEL (SPEC)||";

$result = '';
$link = 'DWG: <a href="http://www.domain.com?drawing=#">#</a>';
$p1 = $p2 = 0;
while (($p1 = strpos($str, 'DWG:', $p2)) !== false) {
    $result .= substr($str, $p2, $p1 - $p2);
    $p2 = strpos($str, '||', $p1+4);
    $drwg = trim(substr($str, $p1+4, $p2 - ($p1 + 4)));
    $result .= str_replace('#', $drwg, $link);
}
$result .= substr($str, $p2);

echo $str . '<br>' . $result;
?>

 

Link to comment
Share on other sites

@Barand Yes! I'd love to do that...but...sadly when it comes to the data entry side there are no standards that are followed. I'm always having to do things backwards and goofy due to how data is stored. I've got a solution I'm working on which I'd like to hear your feedback on...just need some time to finish it. Probably will have it finished some time tomorrow. I actually got by using the strpos and str_replace functions...I look forward to your feedback. When I update I'll provide some more details as to the goofy problems I've had to work around to accomplish this (from bad data to network issues)

Link to comment
Share on other sites

Okay so here my code...it takes like 20 minutes to finish executing though. It would be great if I could make it faster.

<?php
//start timer
$rustart = getrusage();

//allow more memory and execution time for script
ini_set('memory_limit', '4096M');
ini_set('max_execution_time', '60000');

//include dbconfig
include('scripts/dbconfig.php');

//run functions in sequence
delete_drawings();
get_drawing_list($pdo);
cleanup_drawing_list($pdo);
get_drawings($pdo);

function get_drawing_list($pdo) {
	$path = '\\\directoryTo\PDF_Files';

	$dirs = array();

	// directory handle
	$dir = dir($path);
	
	$query = "TRUNCATE TABLE drawings_pdf_files";
	$statement = $pdo->prepare($query);
	$statement->execute();

	while (false !== ($entry = $dir->read())) {
		if ($entry != '.' && $entry != '..') {
		   if (is_dir($path . '/' .$entry)) {
				$query = "INSERT INTO drawings_pdf_files(fileName) VALUES(?)";
				$statement = $pdo->prepare($query);
				$statement->execute([
					$entry,
				]);
		   }
		}
	}
}
function cleanup_drawing_list($pdo) {
	$query = "SELECT * FROM drawings_pdf_files";
	$statement = $pdo->prepare($query);
	$statement->execute();
	$drawings = $statement->fetchAll();

	$query = "SELECT description FROM production_data";
	$statement = $pdo->prepare($query);
	$statement->execute();
	$descriptions = $statement->fetchAll();
	
	foreach ($drawings as $drw) {
		
		//count matches
		$count = 0;
		
		//compare for a match
		foreach ($descriptions as $desc) {
			if (strpos($desc['description'], $drw['fileName'])) {
				$count++;
			}
		}
		
		//remove the record if it has no matches
		if ( $count == 0 ) {
			$query = "DELETE FROM drawings_pdf_files WHERE id = :id";
			$statement = $pdo->prepare($query);
			$statement->execute([
				'id' => $drw['id'],
			]);
		}
	}
}
function get_drawings($pdo) {
	
	$query = "SELECT * FROM drawings_pdf_files";
	$statement = $pdo->prepare($query);
	$statement->execute();
	$drawings = $statement->fetchAll();
	
	foreach($drawings as $drw) {
		
		$dir_pdf = "\\\directoryTo\PDF_Files\\";
		$dir_art = $drw['fileName'];
		$dest_pdf = "drawings/";

		//create new directory for the files
		mkdir($dest_pdf.$dir_art); 

		//copy each file in that folder
		$list_files = array_diff(scandir($dir_pdf.$dir_art), array('..', '.'));
		foreach($list_files as $f) {
			copy( $dir_pdf . $drw['fileName'] . '\\' . $f, $dest_pdf . $drw['fileName'] . '/' . $f);
		}
		
	}
	
}
function delete_drawings() {
	$dir = "drawings/";
	$di = new RecursiveDirectoryIterator($dir, FilesystemIterator::SKIP_DOTS);
	$ri = new RecursiveIteratorIterator($di, RecursiveIteratorIterator::CHILD_FIRST);
	foreach ( $ri as $file ) {
		$file->isDir() ?  rmdir($file) : unlink($file);
	}
	return true;
}

//output usage statistics
//$ru = getrusage();
//echo "This process used " . rutime($ru, $rustart, "utime") . " ms for its computations\nIt spent " . rutime($ru, $rustart, "stime") . " ms in system calls\n";
?>

 

The way that the drawing gets put into the description varies:

CONOCO 1'10x8 VC DF TP SGN||PRINCIPAL ILLUMINATION||ENG: CO3028TP_0VPR||DWG: CO200428||TO BE: DYED DIESEL (SPEC)||

So I can't look for "DWG:" Sometimes they don't put anything in the description telling you its the drawing. People just figure it out from experience. So what I've done is pull in all of the files from the drawing folder, put it into a table, then compare each drawing with each description, remove any drawings that dont have matches, and then copy the drawing files that do have matches.

@BarandWhat do you think?

Link to comment
Share on other sites

One thing I see off the top of my head is to not delete one row at a time. Build a query string that deletes everything you want then execute that query once. This is really a MySQL question and I am not experienced enough to give you the exact syntax but it involves using 'WHERE id IN (...)'

Edited by gw1500se
Link to comment
Share on other sites

Here is sample data from 'drawings_pef_files"

image.png.12bde581d9f8775fe871ab38666b40cc.png

Sample data from 'production_data'

image.thumb.png.61fbd930371c968aecd4506600e1b790.png

Table structure 

--
-- Table structure for table `production_data`
--

DROP TABLE IF EXISTS `production_data`;
CREATE TABLE IF NOT EXISTS `production_data` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `job_number` int(8) NOT NULL,
  `enterprise` tinytext NOT NULL,
  `part_number` text NOT NULL,
  `description` text NOT NULL,
  `psm` tinytext NOT NULL,
  `qty` int(11) NOT NULL,
  `line_item` varchar(11) NOT NULL,
  `as400_ship_date` date DEFAULT NULL,
  `hold_reason` text NOT NULL DEFAULT '0',
  `hold_date` date DEFAULT NULL,
  `insert_time` datetime NOT NULL COMMENT 'time order was inserted or last updated',
  PRIMARY KEY (`id`),
  KEY `job_line` (`job_number`,`line_item`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
COMMIT;

 

In the description field often times there is no DWG: prefixing the drawing number.

 

Edited by mongoose00318
Link to comment
Share on other sites

Is this faster?

$db->exec("DROP TABLE IF EXISTS drawings");
$db->exec("CREATE TABLE drawings (drawing varchar(20) not null PRIMARY KEY)");

$files = glob('files/*.*');
foreach ($files as $f) {
    $names[] = "('" . pathinfo($f, PATHINFO_FILENAME) . "')";
}
// populate the drawings table
$db->exec("INSERT IGNORE INTO drawings (drawing) VALUES " . join(',', $names));

// now delete those drawings NOT found in production table
$db->exec("DELETE d
            FROM drawings d
                LEFT JOIN
                production_data p
                    ON locate(d.drawing, p.description)
            WHERE p.description IS NULL;
            ");

 

Link to comment
Share on other sites

@Barand Hmm weird..it's doing something. But when I dump the $names array it only has 76 values. It should have over 115,000?

Array
(
    [0] => ('(1)')
    [1] => ('1 ART CHECK LIST.CH201192')
    [2] => ('AS180551')
    [3] => ('Art (jacksonville-dc) (Z) - Shortcut (2)')
    [4] => ('Art (jacksonville-dc) (Z) - Shortcut (3)')
    [5] => ('Art (jacksonville-dc) (Z) - Shortcut')
    [6] => ('As173218')
    [7] => ('BK1144SF_LDQs Model (1)')
    [8] => ('CIT00028')
    [9] => ('CP061036_6248_NI')
    [10] => ('CP064096_6948SP1_SBLED')
    [11] => ('CT174631.E')
    [12] => ('CT2065FM_2LD')
    [13] => ('Co0771939')
    [14] => ('Customer Status System')
    [15] => ('Di202434')
    [16] => ('EN3038SF_Qs')
    [17] => ('Ex135513')
    [18] => ('Ex181015')
    [19] => ('Ex183450')
    [20] => ('FMD6SLR-80000_Rev_B')
    [21] => ('FileHandler')
    [22] => ('GR176558')
    [23] => ('HiFashion')
    [24] => ('MA9018CL_BKT')
    [25] => ('MFG (jacksonville-dc) (W) - Shortcut (3)')
    [26] => ('MFG (jacksonville-dc) (W) - Shortcut')
    [27] => ('MFG (jacksonville-dc) (Y) - Shortcut')
    [28] => ('MV-14 4-Product Pricer')
    [29] => ('M_TB1110CS_0IDMB')
    [30] => ('OR1618CP_FXGEHWSP')
    [31] => ('OR163231.E')
    [32] => ('PS151090.E')
    [33] => ('Plantation Market-MaryvilleTN')
    [34] => ('Ps185117')
    [35] => ('RVI_E Order Form - Dispenser ..')
    [36] => ('SFederal He18021413451')
    [37] => ('SG0068CP_ID3w211EMCSP_SBLED')
    [38] => ('SG0068RF_2L2VSP')
    [39] => ('SG0107RF_3w411EX_typA')
    [40] => ('SG133474_e')
    [41] => ('SG160531_e')
    [42] => ('SG167928_e')
    [43] => ('SG200030_e')
    [44] => ('SG2055TP_3w111SP')
    [45] => ('SG2056TP_3w211SP_SBLED Model (1)')
    [46] => ('SG2077FM_3w211SP_TYP_A')
    [47] => ('SG2310RF_1LDSF')
    [48] => ('SG3015TP_TX_SBLED')
    [49] => ('SH002119')
    [50] => ('SH103201 - Shortcut')
    [51] => ('SH141792')
    [52] => ('SHELL 609 ALLISON BONNET DR BESSEMER AL 85125')
    [53] => ('SPECIALT')
    [54] => ('Sd201942_DIE')
    [55] => ('Shortcut to Art & Engineering Request.mdb')
    [56] => ('Shortcut to Enterprise')
    [57] => ('Specialty Contracting Agreement - Rosedale Hwy')
    [58] => ('Synergy Survey Calculator - Eleventh Draft2')
    [59] => ('Thumbs')
    [60] => ('XX174610XX')
    [61] => ('Xx161880.E')
    [62] => ('Xx163854')
    [63] => ('Xx185178.DP')
    [64] => ('Xx4024NI_0IDGM')
    [65] => ('ci164947')
    [66] => ('ci167010')
    [67] => ('ck145891_E')
    [68] => ('export_1')
    [69] => ('mfg (Jacksonville-dc) (V) - Shortcut (2)')
    [70] => ('mfg (Jacksonville-dc) (V) - Shortcut (3)')
    [71] => ('mfg (Jacksonville-dc) (V) - Shortcut (4)')
    [72] => ('mfg (Jacksonville-dc) (V) - Shortcut (5)')
    [73] => ('mfg (Jacksonville-dc) (V) - Shortcut')
    [74] => ('mfg (Jacksonville-dc) (W) - Shortcut (2)')
    [75] => ('su2410tp')
    [76] => ('test')
)

 

Link to comment
Share on other sites

Are you sure you have correct path - those names don't have much in common with the sample filenames that you posted earlier.

Given the volume, it will better to chunk the $names array and add 1000 records per query.

// populate the drawings table
$chunks = array_chunk($names, 1000);
foreach ($chunks as $ch) {
    $db->exec("INSERT IGNORE INTO drawings (drawing) VALUES " . join(',', $ch));
}

 

Link to comment
Share on other sites

  • 2 weeks later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.