hannylicious

Members

View Profile See their activity

Posts
14
Joined
May 6, 2011
Last visited
Never

Profile Information

Gender
Not Telling

hannylicious's Achievements

Newbie (1/5)

Reputation

Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Long story short - I finished up the script and it works perfectly! Wouldn't have got this done without your help - thanks again Fugix!
- May 13, 2011
- 16 replies
preg_replace() - Cuts off last character?

hannylicious replied to hannylicious's topic in Regex Help

Crayon, Thanks so much for the in-depth response. The trim() worked out really nice for what I wanted to do. I really appreciate this info too as I have been struggling to come to grips with regex and how it works and this gives me a much more clear explanation! Thanks! I'm really glad I've come across this forum, you guys are the best! Hopefully after some time I'll be able to help others in the same manner!
- May 11, 2011
- 8 replies
preg_replace() - Cuts off last character?

hannylicious replied to hannylicious's topic in Regex Help

You're completely right Jay, great fix. Thanks a ton for those two fixes, they work perfectly and are so much more simple! That makes life so much easier!!
- May 11, 2011
- 8 replies
preg_replace() - Cuts off last character?

hannylicious replied to hannylicious's topic in Regex Help

Great idea Jay! Unfortunately that will strip all the hyphens won't it? I still need the ones in between the words to remain preferably.
- May 11, 2011
- 8 replies
preg_replace() - Cuts off last character?

hannylicious posted a topic in Regex Help

Hey gang, Probably a simple solution, but I'm having issues with the regex for this - $str = ' --- ----- ------f-oo-oo----'; $str = preg_replace('/^(-|\s)+-(.*[a-zA-Z0-9])[^\-]-+$/' , '$2', $str); // This should be 'f-oo-oo' now - but it produced 'f-oo-o' echo $str; In the example above I'm trying to get rid of all leading spaces/hyphens, and any trailing spaces/hyphens - in my regex it works to get rid of the leading things, but chomps off the last character of the string I'm trying to get (i.e. f-oo-oo becomes f-oo-o) The application I'm using this for is a bit more complex - I'm replacing spaces with hyphens in the title of articles. Some of the hyphenated titles have leading hyphens as well as trailing hyphens. I've noticed that in some of my matches I'm replacing there are still trailing hyphens after this regex runs, I think this is because those are on a 'new line', and I'm not sure how to check for that in regex either. As an added bonus, if anyone knows how to make the output only lower-case that would rock!!
- May 11, 2011
- 8 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Figured it out - it was a simple logic error... the 2nd regexp looks for data between the </h2> and next <h2> - but if the file only has 1 set of <h2></h2> tags there won't be another <h2> tag to capture data between! Back to the drawing board
- May 9, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Well, I'm back already (have I even left yet? ha!) Anyhow, I have a mix of articles in this directory - some of the files contain 1 article (i.e. 1 match of <h2></h2> tags) ,and some of the files contain many matches - the code I had above should work for the individual-article pages as well (I think) For some reason, I can get it to list those files in the directory - but it will not open them and match the Titles/Article-data as it does with the files that contain multiple articles. the code is as follows: <?php //Open images directory $dir = dir("/xampp/htdocs/phpscripts/articles/"); //set directory name $dirname = "C:/xampp/htdocs/phpscripts/articles/"; //List files while (($file = $dir->read()) !== false) { if ($file!="." && $file!=".." ) { echo "filename: " . $file . "<br />"; $html = file_get_contents($dirname.$file) ; $regexptitle = '#\<h2\>(.+?)\<\/h2\>#s'; $regexpdata = '#\<\/h2\>(.+?)\<h2\>#s'; $count = 0; if (preg_match_all($regexptitle, $html, $matches)) { If (preg_match_all($regexpdata, $html, &$matches2)){ while ($count < count($matches[0]) ){ echo $matches[0][$count]; /*echo $matches2[0][$count];*/ $count++ ; } } } else { "There were no article titles found"; } } } $dir->close(); ?> Ultimately I am trying to get it to list the name of the file, then display the contents of it from the results of the preg_match_all() - the only ones this does not work for are the files that only contain 1 article inside them. Any idea as to why not?? I have a feeling it's probably something simple and syntax related, but I can't see it... The ouput as is looks something like this: filename: Article5.html filename: Article6.html filename: Article7.html filename: Article8.html filename: Article9.html filename: Articles-lot-5-6.html /*Below are the titles of the 'Articles-lot-5-6.html' file*/ Remodel your kitchen with brand new cabinets Shopping for kitchen cabinets is a serious task indeed Tips For Buying Best Kitchen Cabinets Tips to organize your kitchen cabinet in the best possible manner Renovate Your Kitchen with Kitchen Cabinets Which type wood to choose for your kitchen cabinets?
- May 9, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Okay! I figured it out a bit - the following code will display the article title, and the article itself in order as it works it's way through the article as a whole (except the last article does not display the article text, just the title). Next up I'll start tinkering with writing all of these things to their own files and saving them as the article title! Thanks again for all of your help Fugix! I'm going to keep this thread open just a bit more because I'm sure I may have questions on writing the new files. <?php echo "<b>Article List:</b>"; $html = file_get_contents("Articles-lot-5-6.html"); $regexptitle = '#\<h2\>(.+?)\<\/h2\>#s'; $regexpdata = '#\<\/h2\>(.+?)\<h2\>#s'; $count = 0; if (preg_match_all($regexptitle, $html, $matches)) { If (preg_match_all($regexpdata, $html, &$matches2)){ while ($count < count($matches[0]) ){ $count++ ; echo $matches[0][$count]; echo $matches2[0][$count]; } } } else { "There were no article titles found"; } ?>
- May 9, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

After some toying around on my own I see now the the ability to count the matches echo count($matches[0])." matches found"; I'll keep playing around and see if I can do as I described earlier, I wouldn't be anywhere near this solution if not for you Fugix! Thanks again!
- May 9, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Your suggestions once again are spot on! Would it be possible to create a count based on the total number of matches and then have it loop through until the count = 0? So it would be something like: count matches = 20, then every loop just have it 'count=count - 1' - then just shove that variable into the $matches[0][$count]? Would that even work? Or would it be better to just manually input the $matches[0][0],$matches[0][1],$matches[0][2],$matches[0][3], etc..?
- May 9, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Alright! That worked! Awesome! It shows the first out of approximately 20 articles, how would I go about having it show all titles? I greatly appreciate your help!
- May 9, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Well, as expected I've ran into some issues I'm trying to take it one step at a time. To begin I'd just like to have it display the titles of the articles on a page (so I can learn this stuff one step at a time). My code looks like this: <?php echo "<b>Article 1:</b>"; $html = file_get_contents("Articles-lot-5-6.html"); $regexp = '#\<h2\>(.+?)\<\/h2\>#s'; if (preg_match_all($regexp, $html, $matches)) { echo $matches[0]; } else { "There were no article titles found"; } ?> It gives me this output: Article 1:Array I'm sure I'm on the right track to this - but again, I'm really new to all this. Any idea on how to move forward from here? I've tried playing with the $regexp and changing it... I've had to do a lot of reading on regex as I don't know it very well at all. If I do: $regexp = '/<h2>(.*?)<\/h2>/'; nothing displays except "Article 1:" I apologize if my code is very simple and my errors very 'common sense' to some of you, I'm just trying to learn this stuff and get a handle on it one bit at a time.
- May 9, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious replied to hannylicious's topic in PHP Coding Help

Alright, excellent. I'll get to reading and see what I can hack up. Thanks for the quick reply and good information! I really appreciate it!
- May 6, 2011
- 16 replies
Split up large html file based on html tags?

hannylicious posted a topic in PHP Coding Help

Hey guys, I'm a total newbie here, and just about as a new to php. My issue: I have a very large .html file that contain multiple articles (I actually have a few of these, but we'll start with one for practicality). The article titles are all wrapped in <h2> tags, there are 10 articles in one file. The articles are very simple, just a title wrapped with <h2> and then a few paragraphs wrapped in <p> tags. What I want to know how to do: I want to know if there's a way to open that file, and have each article saved as it's own .html or .txt document (the title & following paragraphs of each article). Ultimately taking my 1 large file, and creating the subsequent 10 smaller files from the articles inside of it. I am having trouble explaining this in text so I'll try to illustrate: I have "Articles.html" - which contains (article1,article2,article3.. ..article10) I want to split "Articles.html" and create "Article1.html", "Article2.html", "Article3.html", etc. Is that possible? Or am I looking at something far more complex than I can imagine at this point - perhaps something I'd be better off doing by hand? Ultimately I intend to stick all these articles into a database, but that's the 2nd part of what I want to do (and I think will be the easier of the tasks). Let me know if you need any additional information in the event my description above is unclear... I simply am having issues figuring out how to separate out the text into individual articles.
- May 6, 2011
- 16 replies

Sign In

hannylicious

Posts

Joined

Last visited

Profile Information

hannylicious's Achievements

Newbie (1/5)

Reputation

Split up large html file based on html tags?

preg_replace() - Cuts off last character?

preg_replace() - Cuts off last character?

preg_replace() - Cuts off last character?

preg_replace() - Cuts off last character?

Split up large html file based on html tags?

Split up large html file based on html tags?

Split up large html file based on html tags?

Split up large html file based on html tags?

Split up large html file based on html tags?

Split up large html file based on html tags?

Split up large html file based on html tags?

Split up large html file based on html tags?

Split up large html file based on html tags?

Browse

Activity

Important Information