carlosvega1 Posted October 27, 2008 Share Posted October 27, 2008 if you were to do a echo file_get_contents("www.blah.com/example.htm"); what would be the best way to read the html code and extra certain lines from the text or even by read certain tags in it... Quote Link to comment Share on other sites More sharing options...
Caesar Posted October 27, 2008 Share Posted October 27, 2008 You can use regular expressions, if you know what html tags you're looking for specifically. Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted October 27, 2008 Author Share Posted October 27, 2008 ok i'm pretty new at this stuff... used to code html a long time ago.. how so.. i know which tags i want.. in this case it's <p class="sample"> Quote Link to comment Share on other sites More sharing options...
MasterACE14 Posted October 27, 2008 Share Posted October 27, 2008 Regular Expressions Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted October 27, 2008 Author Share Posted October 27, 2008 ok from what i get of this is i can read it but what was the code to reproduce the text i read Quote Link to comment Share on other sites More sharing options...
DarkWater Posted October 27, 2008 Share Posted October 27, 2008 I'm writing the second tutorial in the series actually, and it tells you how you can actually use the stuff in that tutorial in PHP. For now though, go to the PHP manual and look up preg_match(). Quote Link to comment Share on other sites More sharing options...
thebadbad Posted October 27, 2008 Share Posted October 27, 2008 Simple example to start you off with: <?php $file = file_get_contents($url); preg_match('~<p class="sample">(.*?)</p>~is', $file, $matches); //$matches will be an array with full pattern match as first element and parenthesized pattern match as second element echo '<pre>', print_r($matches, true), '</pre>'; //If there's more than one p tag with class="sample", you can use preg_match_all() instead, to grab the contents of all of them ?> Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted October 27, 2008 Author Share Posted October 27, 2008 Great thanks.. http://wookeh.net/csc4900/OAT/test.php It works perfect.. now from there i have to extract those certain things that print out and search another file for the discriptions of the files.. for more of an idea of what i want to do is:: Hey.. i am working alongside a teacher to work on a project that enables us to take the general education courses at our university and transfer them into a kind of database to help teachers advise students on which courses they have to take and maybe some kind of tree that links similar courses and also make it o that if an entree changes in the school catalog.. than it is easily changeable.. now my question is what would be the best way to start this.. php with mysql... maybe create something that reads the current catalog and organizes these things.. any help is appreciated.. Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 3, 2008 Author Share Posted November 3, 2008 So i have been working on a project with my teacher to extract certain parts of a several different documents and put them together in one file.. for example This is a project designed to get the course catalog of general education courses that students need to take.. take it from one file and lok in another file for its description.. for example.. i've been able to get (with help) all the course descriptions of the classes by using: $file = file_get_contents("certain website"); preg_match_all('~<p class=gened4>(.*?)</p>~is', $file, $matches); echo '<pre>', print_r($matches, true), '</pre>'; and it prints them out in arrays like this: Array ( [0] => Array ( [0] => ART 2020 Introduction to Digital Arts [1] => ART 2050 Art Appreciation [2] => ART 2080 Survey of Art I Now by looking at the other file that has these specific definitions.. i've been able to extra their definitions in another file Array ( [0] => Array ( [0] => A study and application of design principles in creative two‑dimensional projects in line, value, color and texture. Credit, 3 semester hours. my question is.. if anyone knows.. how i could get these two seperate things.. by searching within the arrays and combine them in a file of its own? Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 3, 2008 Share Posted November 3, 2008 If the course names matches the descriptions, you could use array_combine() with the single dimentional arrays as parameters. That would return a new array with course names as keys and descriptions as values. Example: <?php //these arrays should be similar to the ones you posted $courses = array( array( 'ART 2020 Introduction to Digital Arts', 'ART 2050 Art Appreciation', 'ART 2080 Survey of Art I' ) ); $descriptions = array( array( 'ART 2020 is about..', 'ART 2050 is about..', 'ART 2080 is about..' ) ); if (!$combined = array_combine($courses[0], $descriptions[0])) { die('Provided arrays in array_combine() are not of equal length or are empty.'); } echo '<pre>', print_r($combined, true), '</pre>'; ?> Output: Array ( [ART 2020 Introduction to Digital Arts] => ART 2020 is about.. [ART 2050 Art Appreciation] => ART 2050 is about.. [ART 2080 Survey of Art I] => ART 2080 is about.. ) But with that array, you can only pick out an element, if you know the exact course name. Another method is to merge the arrays in a two dimensional array, where the keys are numerical, and easily accessible: <?php //these arrays should be similar to the ones you posted $courses = array( array( 'ART 2020 Introduction to Digital Arts', 'ART 2050 Art Appreciation', 'ART 2080 Survey of Art I' ) ); $descriptions = array( array( 'ART 2020 is about..', 'ART 2050 is about..', 'ART 2080 is about..' ) ); if (count($courses[0]) != count($descriptions[0])) { die('Provided arrays are not of equal length.'); } $combined = array(); $count = count($courses[0]); for ($i = 0; $i < $count; $i++) { $combined[] = array( 'name' => $courses[0][$i], 'desc' => $descriptions[0][$i] ); } echo '<pre>', print_r($combined, true), '</pre>'; ?> Output: Array ( [0] => Array ( [name] => ART 2020 Introduction to Digital Arts [desc] => ART 2020 is about.. ) [1] => Array ( [name] => ART 2050 Art Appreciation [desc] => ART 2050 is about.. ) [2] => Array ( [name] => ART 2080 Survey of Art I [desc] => ART 2080 is about.. ) ) Hope that helps Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 3, 2008 Author Share Posted November 3, 2008 yes that's a big help... so for example if in this file: http://wookeh.net/csc4900/OAT/art.php there descritions are in the second part oif the array there is a way to incorporate these into the descript? or do i have to add the Desc.. manually? Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 3, 2008 Share Posted November 3, 2008 Sure, that can be done using one of my above methods (I would use the latter). But there's a problem - you have 64 courses and only 52 descriptions. Aren't they supposed to match up? Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 3, 2008 Author Share Posted November 3, 2008 yes.. now lies the tricky part.. because of these 52 descriptions i am only taking 2-3 of them.. the names match from the test.php.. but i guess what i was going for was taking the names just from the test.php file and extracting the 3 from the art section.. then two for english.. etc http://wookeh.net/csc4900/OAT/index.php .. like i say it sounds good in theory.. but im not very good at coding.. Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 10, 2008 Author Share Posted November 10, 2008 anyone with any ideas? Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 10, 2008 Author Share Posted November 10, 2008 ok to better clarify what i am trying to do.. i tried to get a more detailed explanation so here it goes... OK what i am basically trying to do is take all the general ed courses at my university... in the first file test.php is the code to extract all the courses from the university course catalog html file and arrange them in arrays... The other files are major specific catalogs with the class name and the class descriptions.. now the biggest challenge is having the class courses described in the test.php file to match the class name and descriptions in their specific catalogs for example: in the test.php file it says: [0] => ART 2020 Introduction to Digital Arts now i want to look in the art.php and find [8] => ART 2020. Digital Arts Appreciation and match it with it's description [8] => ART 2020 is an opportunity, for non-Art majors, for introductory study and activity in various contemporary means of visual communication and design thinking practiced through digital means. The DAA 2020 curriculum is focused both on digital literacy and on design thinking. As such students will find both computers and working creatively with computers and related technologies co-equal foci of this course. DAA 2020 is open to all students and has no prerequisites. Credit, 3 semester hours. Is this possible? Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 10, 2008 Share Posted November 10, 2008 Sure it's possible. Haven't got the time now, but I'll help you tomorrow. Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 11, 2008 Share Posted November 11, 2008 I guess I was too quick saying it was possible (well, it maybe is). You need more consistent data to do what you want, since it's difficult to link e.g. the ENG 2010 title in test.php with its description in english.php. The index numbers obviously don't match up in the arrays, and "ENG 2010" isn't mentioned in the description. You could possibly loop through the courses array and if a course starts with ENG, you could look at the english.php array, search for the course name (e.g. ENG 2010), fetch the index number, and then extract the description in the other english.php array, via the fetched index number minus 1 (since the descriptions are offset by 1). I'm not writing the code for you though. You were supposed to learn how to do this, right? Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 18, 2008 Author Share Posted November 18, 2008 ok just a quick question before i take another stab at this.. if i were to fwrite this to a text document.. what would be the best way to read lines from this specific array and fwrite it to the text using the previous code... Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 18, 2008 Share Posted November 18, 2008 If you've got a two dimensional array, you can use a foreach loop nested inside a foreach loop. Previous example modified: <?php //these arrays should be similar to the ones you posted $courses = array( array( 'ART 2020 Introduction to Digital Arts', 'ART 2050 Art Appreciation', 'ART 2080 Survey of Art I' ) ); $data = ''; foreach ($courses as $chunk) { foreach ($chunk as $course) { $data .= "$course\r\n"; //windows line break at end of each line } } $handle = fopen('file.txt', 'w'); //if file exists, truncate it, else attempt to create it fwrite($handle, $data); fclose($handle); ?> If you're using PHP 5 you can use file_put_contents('file.txt', $data); instead of the last three lines. Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 18, 2008 Author Share Posted November 18, 2008 cool... i found another way too with: but the fwrite is not working.. it just outputs Array... <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>Untitled Document</title> </head> <?php $file = file_get_contents("http://www.uncp.edu/catalog/html/art.htm"); preg_match_all('~<p class=Coursename>(.*?)</p>~is', $file, $matches); //$matches will be an array with full pattern match as first element and parenthesized pattern match as second element echo '<pre>', print_r($matches, true), '</pre>'; //If there's more than one p tag with class="sample", you can use preg_match_all() instead, to grab the contents of all of them $file2 = file_get_contents("http://www.uncp.edu/catalog/html/art.htm"); preg_match_all('~<p class=Coursedescription>(.*?)</p>~is', $file2, $matches1); //$matches will be an array with full pattern match as first element and parenthesized pattern match as second element echo '<pre>', print_r($matches1, true), '</pre>'; //If there's more than one p tag with class="sample", you can use preg_match_all() instead, to grab the contents of all of them //start an empty array. $finalArray = array(); //loop through the courses. foreach($matches[1] as $key=>$value){ //assign matching 'Course' and 'Description' values. $finalArray[$key]['Coursename'] = $value; $finalArray[$key]['Description'] = $matches1[0][$key]; $fp = fopen('data.txt', 'w'); fwrite($fp, $finalArray); fclose($fp); } //display the final array. print_r($finalArray); $fp = fopen('data.txt', 'w'); fwrite($fp, $finalArray); fclose($fp); ?> <body> </body> </html> Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 18, 2008 Share Posted November 18, 2008 You can't write an array to the file. You will have to use the code i just posted to loop through $finalArray and build up the $data string. You should also remove $fp = fopen('data.txt', 'w'); fwrite($fp, $finalArray); fclose($fp); from the foreach loop, in your code. Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 18, 2008 Author Share Posted November 18, 2008 awesome.. seems to work fine.. http://wookeh.net/csc4900/OAT/data.txt only expect there is html coding in the text file too.. i appreciate all the help you've given me by the way! Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 18, 2008 Share Posted November 18, 2008 Cool You can remove (X)HTML tags by running the strings through strip_tags(), when you build the $finalArray. Quote Link to comment Share on other sites More sharing options...
carlosvega1 Posted November 18, 2008 Author Share Posted November 18, 2008 cool.. maybe i'm doing this wrong but i have tried the stirp tags in all different parts: $finalArray = array(); //loop through the courses. foreach($matches[1] as $key=>$value){ //assign matching 'Course' and 'Description' values. $finalArray[$key]['Coursename'] = $value; $finalArray[$key]['Description'] = $matches1[0][$key]; strip_tags($key); strip_tags($finalArray); } am i way off? Quote Link to comment Share on other sites More sharing options...
thebadbad Posted November 18, 2008 Share Posted November 18, 2008 Run it on $value and $matches1[0][$key]: $finalArray = array(); //loop through the courses. foreach($matches[1] as $key=>$value){ //assign matching 'Course' and 'Description' values. $finalArray[$key]['Coursename'] = strip_tags($value); $finalArray[$key]['Description'] = strip_tags($matches1[0][$key]); } Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.