Jump to content

kinitex

Members
  • Posts

    13
  • Joined

  • Last visited

    Never

Everything posted by kinitex

  1. This would be so much easier if I could just wait for the sort to take place before dom parser grabs the rows but I can't figure that out either.
  2. Ok I'm willing to pay someone to finish this off for me. Theres a few other small features I'd like added that I have no idea how to implement. Please pm with if your interested.
  3. I've got this... <?php require('simple_html_dom.php'); function scraping_table() { // create HTML DOM $html = file_get_html('http://www.forexfbi.com/best-forex-brokers-comparison/'); foreach($html->find('tr') as $row) { // get name $item['name'] = $row->find('td',0)->plaintext; // get rating $item['rating'] = $row->find('td',1)->innertext; $ret[] = $item; } // clean up memory $html->clear(); unset($html); return $ret; } $ret = scraping_table(); asort($ret,SORT_NUMERIC); echo '<table><thead><tr><th>Broker</th><th>Rating</th></tr></thead><tbody>'; foreach($ret as $v) { echo '<tr>'; echo '<td>'.$v['name'].'</td>'; echo '<td>'.$v['rating'].'</td>'; echo '</tr>'; } echo '</tbody></table>'; ?> Which returns like this http://www.forexfbi.com/bestbrokers2.php which obviously isn't right. It is sorting, im just not sure in what way lol it seems to be kinda random.
  4. I'm not a good enough coder yet to implement that. I was lucky enough just to scrape this script together from tidbits of code around the web and a few tweaks. There has to be a simple solution here that I'm not seeing.
  5. http://www.forexfbi.com/best-forex-brokers-comparison/ I am trying to grab the top 5 rows under the header, which are sorted by column 1 (rating) from highest to lowest. Problem is my script grabs the first 5 rows before the sort takes place, so it's obviously not getting the 5 top rated rows. I am using tablesorter.com to sort. <table id="t15" class="tablesorter {sortlist: [[1,1]]}"> http://www.forexfbi.com/bestbrokers.php Here is my script I'm using to get the code: <?php include("bestbrokers_array.php"); require('simple_html_dom.php'); $table = array(); //$html->load_file('http://www.forexfbi.com/best-forex-brokers-comparison/'); $html = file_get_html('http://www.forexfbi.com/best-forex-brokers-comparison/'); $num=1; foreach($html->find('tr') as $row) { if($num <= 6) { $rating = $row->find('td',1)->innertext; $name = $row->find('td',0)->plaintext; $table[$name][$rating] = true; $num++; } } html_show_array($table); ?> and bestbrokers_array.php <?php function do_offset($level){ $offset = ""; // offset for subarry for ($i=1; $i<$level;$i++){ $offset = $offset . "<td></td>"; } return $offset; } function show_array($array, $level, $sub){ if (is_array($array) == 1){ // check if input is an array foreach($array as $key_val => $value) { $offset = ""; if (is_array($value) == 1){ // array is multidimensional echo "<tr>"; $offset = do_offset($level); echo $offset . "<td>" . $key_val . "</td>"; show_array($value, $level+1, 1); } else{ // (sub)array is not multidim if ($sub != 1){ // first entry for subarray echo "<tr nosub>"; $offset = do_offset($level); } $sub = 0; echo $offset . "<td main ".$sub." width=\"120\">" . $key_val . "</td>"; echo "</tr>\n"; } } //foreach $array } else{ // argument $array is not an array return; } } function html_show_array($array){ echo "<table cellspacing=\"0\" border=\"2\">\n"; show_array($array, 1, 0); echo "</table>\n"; } ?>
  6. I need help adding a feature to the following php script that goes through folders and reads from txt files. Right now it is just grabbing the sub folder name, title and body and exporting it to a csv in columns A B and C respectively. What I need it to do is grab a summary from each txt file as well and added to the 4th column in the csv. I think the best way to do this would be to grab from the beginning of the body, to pre-defined closing }. So If I set it at 25 it will end the summary on the 25th } found in the txt file from the beginning of the body. All the txt is in spintax format like "The {Fox|Bird|Cat} {Stole|Took} The {Food|Water}" <?php set_time_limit(0); // set unlimited execution time $base_folder = $_POST['base_folder']; $article_to_capture = (int)$_POST['article_to_capture']; $words = explode(',', $_POST['words']); // print_r($words); die(''); if(!is_dir($base_folder)) die('Invalid base folder. Please go <a href="step1.php"><strong>back</strong></a> and enter correct folder.'); ?><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <title>Artcle Scraper Step 2</title> <style type="text/css"> <!-- body { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; color: #333333; } --> </style> </head> <body> <h2>Step 2 : Processing the content of the folder. </h2> <table width="100%" border="0" cellpadding="2" cellspacing="1" bgcolor="#CCCCCC"> <tr bgcolor="#FFFFFF"> <th width="10%"> </td> <th width="30%">BASE FOLDER NAME</td> <th width="50%"> <?php echo $base_folder;?></td> <th width="10%"> </td> </tr> <?php $subfolder_arr = scandir($base_folder); //print_r($arr1); $total_subfolders = sizeof($subfolder_arr); $subfolder_count = 0; $file_count = 0; $report = ""; $fp = fopen('articles.csv', 'w+'); for($i=0; $i< $total_subfolders; $i++){ $file_name = $subfolder_arr[$i]; if($file_name=='.'||$file_name=='..') continue; $sub_folder_name = $base_folder ."\\". $file_name; $file_type = is_dir($sub_folder_name) ? 'dir' : 'file'; if($file_type=='dir'){ $sub_folder_count++; $rpeort .= "Processing folder $sub_folder_count $sub_folder_name \r\n"; $msg = "Processing folder $sub_folder_count $sub_folder_name \r\n"; ?> <tr bgcolor="#FFFFFF"><td> </td><td colspan="2"> <?php echo $msg;?> </td><td> </td></tr> <tr bgcolor="#FFFFFF"><td> </td><td colspan="2"> <table width="90%" cellpadding="0" cellspacing="0" border="1" bordercolorlight="#0000FF"> <?php // process sub folder $column1 = $file_name; $column2 = '{'; $column3 = '{'; $first = true; $files_arr = scandir($sub_folder_name); $article_processed =0; // article_processed in current sub folder foreach($files_arr as $key=>$val){ if(is_file($sub_folder_name.'\\'.$val) ) { if( substr($val,-4)=='.txt' && (filesize($sub_folder_name.'\\'.$val) <= 35000) && (filesize($sub_folder_name.'\\'.$val) >= 4000)) //file is > 1kb { $size = filesize($sub_folder_name.'\\'.$val); $article_processed++; if($article_to_capture==0 || $article_processed <= $article_to_capture ){ if($first==true) $first=false; else { $column2 .= '|'; $column3 .= '|'; } // read file get title and body $file_content = file($sub_folder_name.'\\'.$val); $file_title = rtrim($file_content[0]); $file_content[0] = ''; $file_arr_size = sizeof($file_content); $words_arr_size = sizeof($words); $t=1; while($t < $file_arr_size){ $file_content[$t] = rtrim($file_content[$t]); //echo $file_content[$t]; //die('inside'); if( $words_arr_size>0 ){ //die('inside'); $temp = str_replace($words, "", $file_content[$t]); $file_content[$t] = $temp; } $t++; //if($t>=3) die('aa'); } $file_body = implode('',$file_content); $column2 .= $file_title; $column3 .= $file_body; ?> <tr><td> <?php //print_r($files_arr); echo $val ."\r\n"; echo round(($size / 1024), 2).' KB'; ?> </td></tr> <?php } //end if .txt } // article processed } // end if is_file } // end foreach ?> </table> </td><td> </td></tr> <?php $column2 .= '}'; $column3 .= '}'; // write to csv / excel file $erro = fputcsv ($fp, array($column1,$column2,$column3) ); } //end if filetype else{ } } // end for fclose($fp); ?> <tr bgcolor="#FFFFFF"> <td> </td> <td colspan=""> File Generated. Download it <a href="articles.csv" target="_blank">HERE</a></td> <td> </td> </tr> </table> </body> </html>
  7. Sweet I got it with if( substr($val,-4)=='.txt' && (filesize($sub_folder_name.'\\'.$val) <= 17000)) Thank you for showing me the light
  8. Thanks for the quick response, but I added it and also tried to echo it but's its just displaying 0 KB for ever file and not filtering anything. I changed it to <= 17000 which would be 17KB right?
  9. Can someone help me add a filter to my script that reads txt files from subfolders, I want it to only read txt files that are <= a certain KB size 1.<?php 2.set_time_limit(0); // set unlimited execution time 3. 4. 5.$base_folder = $_POST['base_folder']; 6.$article_to_capture = (int)$_POST['article_to_capture']; 7. 8.$words = explode(',', $_POST['words']); 9. 10. 11.// print_r($words); die(''); 12. 13. 14.if(!is_dir($base_folder)) 15. die('Invalid base folder. Please go <a href="step1.php"><strong>back</strong></a> and enter correct folder.'); 16. 17.?><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> 18.<html> 19.<head> 20.<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 21.<title>Artcle Scraper Step 2</title> 22.<style type="text/css"> 23.<!-- 24.body { 25. font-family: Verdana, Arial, Helvetica, sans-serif; 26. font-size: 12px; 27. color: #333333; 28.} 29.--> 30.</style> 31.</head> 32. 33.<body> 34.<h2>Step 2 : Processing the content of the folder. </h2> 35.<table width="100%" border="0" cellpadding="2" cellspacing="1" bgcolor="#CCCCCC"> 36. <tr bgcolor="#FFFFFF"> 37. <th width="10%"> </td> 38. <th width="30%">BASE FOLDER NAME</td> 39. <th width="50%"> <?php echo $base_folder;?></td> 40. <th width="10%"> </td> 41. </tr> 42.<?php 43. 44.$subfolder_arr = scandir($base_folder); 45. 46.//print_r($arr1); 47. 48.$total_subfolders = sizeof($subfolder_arr); 49.$subfolder_count = 0; 50.$file_count = 0; 51.$report = ""; 52. 53.$fp = fopen('articles.csv', 'w+'); 54. 55.for($i=0; $i< $total_subfolders; $i++){ 56. $file_name = $subfolder_arr[$i]; 57. 58. if($file_name=='.'||$file_name=='..') 59. continue; 60. 61. $sub_folder_name = $base_folder ."\\". $file_name; 62. $file_type = is_dir($sub_folder_name) ? 'dir' : 'file'; 63. if($file_type=='dir'){ 64. $sub_folder_count++; 65. $rpeort .= "Processing folder $sub_folder_count $sub_folder_name \r\n"; 66. $msg = "Processing folder $sub_folder_count $sub_folder_name \r\n"; 67.?> 68. <tr bgcolor="#FFFFFF"><td> </td><td colspan="2"> 69. <?php echo $msg;?> 70. </td><td> </td></tr> 71. <tr bgcolor="#FFFFFF"><td> </td><td colspan="2"> 72. <table width="90%" cellpadding="0" cellspacing="0" border="1" bordercolorlight="#0000FF"> 73.<?php 74. // process sub folder 75. $column1 = $file_name; 76. $column2 = '{'; 77. $column3 = '{'; 78. $first = true; 79. $files_arr = scandir($sub_folder_name); 80. $article_processed =0; // article_processed in current sub folder 81. foreach($files_arr as $key=>$val){ 82. 83. if(is_file($sub_folder_name.'\\'.$val) ) 84. { if( substr($val,-4)=='.txt' ) 85. { 86. $article_processed++; 87. 88. if($article_to_capture==0 || $article_processed <= $article_to_capture ){ 89. 90. if($first==true) $first=false; 91. else 92. { $column2 .= '|'; $column3 .= '|'; } 93. 94. // read file get title and body 95. $file_content = file($sub_folder_name.'\\'.$val); 96. $file_title = rtrim($file_content[0]); 97. 98. $file_content[0] = ''; 99. 100. $file_arr_size = sizeof($file_content); 101. $words_arr_size = sizeof($words); 102. $t=1; 103. while($t < $file_arr_size){ 104. $file_content[$t] = rtrim($file_content[$t]); 105. //echo $file_content[$t]; 106. //die('inside'); 107. if( $words_arr_size>0 ){ 108. //die('inside'); 109. $temp = str_replace($words, "", $file_content[$t]); 110. $file_content[$t] = $temp; 111. } 112. $t++; 113. //if($t>=3) die('aa'); 114. } 115. $file_body = implode('',$file_content); 116. 117. $column2 .= $file_title; 118. $column3 .= $file_body; 119. 120. ?> 121. <tr><td> 122. <?php //print_r($files_arr); 123. echo $val ."\r\n"; 124. ?> 125. </td></tr> 126. <?php 127. } //end if .txt 128. } // article processed 129. } // end if is_file 130. } // end foreach 131.?> 132.</table> 133.</td><td> </td></tr> 134.<?php $column2 .= '}'; 135. $column3 .= '}'; 136. 137. // write to csv / excel file 138. $erro = fputcsv ($fp, array($column1,$column2,$column3) ); 139. 140. } //end if filetype 141. else{ 142. 143. } 144.} // end for 145. 146.fclose($fp); 147. 148.?> <tr bgcolor="#FFFFFF"> 149. <td> </td> 150. <td colspan=""> File Generated. 151. Download it <a href="articles.csv" target="_blank">HERE</a></td> 152. <td> </td> 153. </tr> 154.</table> 155.</body> 156.</html>
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.