Jump to content

Can't get my regex to match correctly


j.reyes093

Recommended Posts

I hope this isn't too obvious of a problem, but I can't seem to get this regex to match correctly when something is added.

 

I'm attempting to pull data from http://www.usccb.org/nab/080110.shtml in order to put into a database for use on my main website (yes, I have permission). I am using an html parser to grab plain text from the page and am using preg functions to match what I need. I run into trouble when I attempt to pull the reference from the page (e.g. Ecc 1:2; 2:21-23 on the page provided). My preg function looks like this:

 

preg_match_all("~".$readings[0][$a]." *\w{2,4} ([0-9]{1,2}: *([0-9]{1,2}((?=-)(-[0-9]{1,2})|()),? ?)+;? *)+~is",$page,$reference);

 

My entire code looks like this:

<HTML>
<HEAD>
<TITLE>Get Scripture</TITLE>
</HEAD>
<BODY>
Please enter a starting date (MM-DD-YY):
<BR>
<FORM method='post' action='get_scriptures.php?action=go'>
<INPUT type='text' maxlength='8' size='3' name='date'>
<BR>
<BR>
Please enter an ending day within the same month(DD):
<BR>
<INPUT type='text' maxlength='2' size='1' name='date_e'>
<BR>
<INPUT type='submit' value='Go'>
</FORM>
<BR>
<?
include_once('simple_html_dom.php');

$action = $_GET['action'];
$date = $_POST['date'];
$date_e = $_POST['date_e'];
echo "<BR>";
if($action == "go" && preg_match("~^..-..-..$~",$date)) {
$s_day = substr($date,3,2);
for($c_day = $s_day ;$c_day<=$date_e;$c_day++) {
	if($c_day != "01" && $c_day < 10)
		$c_day = "0".$c_day;
	$c_date = str_replace($s_day,$c_day,$date);
	$c_date = preg_replace("~-~","",$c_date);
	$page = file_get_html("http://www.usccb.org/nab/".$c_date.".shtml")->plaintext;
	$month_name = date( 'F', mktime(0, 0, 0, substr($date,0,2) ));
	if(is_string($c_day))
		$c_day = substr($c_day,1,1);
	preg_match_all("/".$month_name." ".$c_day.", 20".substr($date,6,2)."/i",$page,$datematch); //Scans for date
	echo "<BR><B>Date:</B>".$datematch[0][0];		
	preg_match_all("/Lectionary:((??!".$datematch[0][0].").)*)/is",$page,$lectionary_match); //Scans for Lectionary (gives all data on page)
	preg_match_all("/((?<=Lectionary: [0-9]{3}).+)/si",$lectionary_match[0][0],$reading_types);
	$reading_types = preg_replace("/Responsorial Psalm/","",$reading_types[0][0]);
	preg_match_all("/^.+$/m",$reading_types,$readings);
	echo "<BR><B>Readings:</B>";
		for($a=0;$readings[0][$a];$a++) {
		$readings[0][$a] = trim($readings[0][$a]);
		if(!preg_match("/\w/",$readings[0][$a])) {
			unset($readings[0][$a]);
			$readings[0] = array_values($readings[0]);
			}
	}		
	for($a=0;$a < count($readings[0])-1;$a++) {
		$readings[0][$a] = trim($readings[0][$a]);
		echo "<BR>".$readings[0][$a]."<BR>";
		preg_match_all("~".$readings[0][$a]." *\w{2,4} ([0-9]{1,2}: *([0-9]{1,2}((?=-)(-[0-9]{1,2})|()),? ?)+;? *)+~is",$page,$reference); //PROBLEM
		for($b=0;$reference[0][$b];$b++) {
			$reference[0][$b] = preg_replace("/".$readings[0][$a]." */","",$reference[0][$b]);
			echo $reference[0][$b]."<BR>";
			}



	}





	echo "<BR><HR>";
}

?>
<BR>
<HR>
<a href='get_scriptures.php'>Click here to go back</a>
<?
}
?>




</BODY>
</HTML>



 

 

The problem is that that preg fails to find the reference on the page. When i do a var_dump of $page I get  (for the first one at least):

string(6691) " USCCB | NAB - August 1, 2010 USCCB Home Topics News Readings Movies Bible Catechism Bishops Dioceses Departments Publications New American Bible New American Bible Today's Reading NAB Podcast Video Daily Reflections Frequently Asked Questions Stations of the Cross PDA Formatted Readings New American Bible Introduction •  Preface (Old Testament) •  Preface (1970 New Testament) •  Preface (1986 New Testament) Permissions Policy •  New American Bible •  Lectionary for Mass Saint of the Day June 2010 Sun Mon Tue Wed Thur Fri Sat     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30       July 2010 Sun Mon Tue Wed Thur Fri Sat         1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 August 2010 Sun Mon Tue Wed Thur Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31         September 2010 Sun Mon Tue Wed Thur Fri Sat       1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30     October 2010 Sun Mon Tue Wed Thur Fri Sat           1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30             Lectionary: 114 Reading 1 Responsorial Psalm Reading 2 Gospel August 1, 2010 Eighteenth Sunday in Ordinary Time Reading 1 Ecc 1:2; 2:21-23 Vanity of vanities, says Qoheleth, vanity of vanities!  All things are vanity! Here is one who has labored with wisdom and knowledge and skill, and yet to another who has not labored over it, he must leave property. This also is vanity and a great misfortune. For what profit comes to man from all the toil and anxiety of heart with which he has labored under the sun? All his days sorrow and grief are his occupation; even at night his mind is not at rest. This also is vanity. Ps 90:3-4, 5-6, 12-13, 14, 17 Responsorial Psalm R. (1) If today you hear his voice, harden not your hearts. You turn man back to dust, saying, “Return, O children of men.†For a thousand years in your sight are as yesterday, now that it is past, or as a watch of the night. R. If today you hear his voice, harden not your hearts. You make an end of them in their sleep; the next morning they are like the changing grass, Which at dawn springs up anew, but by evening wilts and fades. R. If today you hear his voice, harden not your hearts. Teach us to number our days aright, that we may gain wisdom of heart. Return, O LORD! How long? Have pity on your servants! R. If today you hear his voice, harden not your hearts. Fill us at daybreak with your kindness, that we may shout for joy and gladness all our days. And may the gracious care of the LORD our God be ours; prosper the work of our hands for us! Prosper the work of our hands! R. If today you hear his voice, harden not your hearts. Col 3:1-5, 9-11 Reading 2 Brothers and sisters: If you were raised with Christ, seek what is above, where Christ is seated at the right hand of God. Think of what is above, not of what is on earth. For you have died, and your life is hidden with Christ in God. When Christ your life appears, then you too will appear with him in glory. Put to death, then, the parts of you that are earthly: immorality, impurity, passion, evil desire, and the greed that is idolatry. Stop lying to one another, since you have taken off the old self with its practices and have put on the new self, which is being renewed, for knowledge, in the image of its creator. Here there is not Greek and Jew, circumcision and uncircumcision, barbarian, Scythian, slave, free; but Christ is all and in all. Lk 12:13-21 Gospel Someone in the crowd said to Jesus, “Teacher, tell my brother to share the inheritance with me.†He replied to him, “Friend, who appointed me as your judge and arbitrator?†Then he said to the crowd, “Take care to guard against all greed, for though one may be rich, one’s life does not consist of possessions.†Then he told them a parable. “There was a rich man whose land produced a bountiful harvest. He asked himself, ‘What shall I do, for I do not have space to store my harvest?’ And he said, ‘This is what I shall do: I shall tear down my barns and build larger ones. There I shall store all my grain and other goods and I shall say to myself, “Now as for you, you have so many good things stored up for many years, rest, eat, drink, be merry!â€â€™ But God said to him, ‘You fool, this night your life will be demanded of you; and the things you have prepared, to whom will they belong?’ Thus will it be for all who store up treasure for themselves but are not rich in what matters to God.†  Next Day Lectionary for Mass for Use in the Dioceses of the United States, second typical edition, Copyright © 2001, 1998, 1997, 1986, 1970 Confraternity of Christian Doctrine; Psalm refrain © 1968, 1981, 1997, International Committee on English in the Liturgy, Inc. All rights reserved. Neither this work nor any part of it may be reproduced, distributed, performed or displayed in any medium, including electronic or digital, without permission in writing from the copyright owner.        	 Email us at nabquestion@usccb.orgNew American Bible | 3211 4th Street, N.E., Washington DC 20017-1194 | (202) 541-3000 © USCCB. All rights reserved. " 

 

And within the above var_dump is

Reading 1 Ecc 1:2; 2:21-23

which should be getting picked up. I know it is a problem with $readings[0][$a] within the regex as if this is removed then it picks up all four of the reference strings within $page (as it should, but I need each individually). Even if I manually type in Reading 1 instead of $readings[0][$a] it still doesn't work. I hope this isn't too obvious of an error, I've been trying to figure this out forever.

 

Any ideas? Help would be extremely appreciated. Thanks in advance!

 

 

Link to comment
Share on other sites

I'm barely understanding what you are talking about, but assuming I did understand you correctly you wish for...

 

preg_match_all("~".$readings[0][$a]." *\w{2,4} ([0-9]{1,2}: *([0-9]{1,2}((?=-)(-[0-9]{1,2})|()),? ?)+;? *)+~is",$page,$reference);

...to match (along with possible other instances)...

 

Reading 1 Ecc 1:2; 2:21-23

 

Either way that's one confusing regex. Unless I'm not grasping the subtleties of the allowed variations, it should be a lot simpler. Can you give more examples of values you are trying to match, so that I can attempt to understand what you are trying?

 

 

Link to comment
Share on other sites

Sorry about the confusion. Yes, you have that correct. I am trying to pull from $page (the parsed html page) the reference that is listed under each separate reading from this http://www.usccb.org/nab/080110.shtml website. So examples of this would be:

Ecc 1:2; 2:21-23

Col 3:1-5, 9-11

Lk 12:13-21

Jer 28:1-17

Jer 30:1-2, 12-15, 18-22

 

The regex without the .$readings[0][$a] actually identifies each reference fine. The problem is that I want to be able to link each reference to the certain reading. That's where the $readings[0][$a] comes in. The $readings[0][$a] string will loop through and will contain things such as Reading 1, Reading 2, Gospel 1, Etc.  So essentially I want the regex to match:

 

Reading 1 Ecc 1:2; 2:21-23

or

Reading 2 Col 3:1-5, 9-11

etc.

 

but for some reason when the $readings[0][$a] part is added the regex breaks and no longer matches anything. Even when I remove the variable and type in a value such as Reading 1, the regex doesn't match. It's as if the $page sting doesn't actually consist of what the var_dump says it does.

 

 

Hopefully that was less confusing. Sorry about that, let me know if I need to clarify anything.

Link to comment
Share on other sites

It's as if the $page string doesn't actually consist of what the var_dump says it does.

After some more testing this seems to be the main problem.

 

For some reason

var_dump(strpos($page,"Reading 1 Ecc 1:2; 2:21-23"));

returns false. Meaning that this string wasn't found within $page. A var_dump of $page shows that this appears someway through the string:

...Reading 1 Ecc 1:2; 2:21-23...

 

So what the heck? Is there something I'm missing? I'm using a var_dump so it should output exactly what the string is, correct?

 

 

Link to comment
Share on other sites

Well my gut reaction is you'd be better off using a DOM parser to simply take the value out of the original HTML. But if you need a regex, something like this should work...

 

$pattern = '#'.$reading.'\s+\w+\s+\d+?:(?:\d+(?:-\d+)?(?:, )?)+)#i';

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.