Jump to content

webscrape certain line


wwfc_barmy_army

Recommended Posts

Hello.

 

I'm using this as fuctions for a web scrape:

http://sourceforge.net/projects/simplehtmldom/

 

I've figured out everything else but i just have one last thing i'm having trouble with; using $html->plaintext allows me to output all text on the webpage. Although i am looking just to get a certain line;

 

The source page has a line like the following:

Password: apassword

 

this is shown in the $html->plaintext output but seems to be all on one line from what i can tell.

 

So my question, how can i just get the line that starts with "Password: "?

 

Thanks for any help/advice/code.

 

Link to comment
Share on other sites

dont know if this will be of any help i used this earlier to scrape <input type="hidden" name="nknkckc" value="jjcfjccc" />

 

$matches = array();
preg_match_all('/<input type\="hidden" name\="([^"]+)".*?value\="([^"]*)"[^>]*>/si', $html,$matches);
$values = $matches[2];

matches[0]; returns the value of what preceeds the next two

matches[1]; returns the value of name=""

matches[2]; returns the value of value=""

 

not sure can't offer any more than this at the moment i have been cold coding this stuff for days now its a real headache

but just Password: apassword is it not as a name="" or value="' in the html?

you probably just want to use plain old preg_match as i was getting the variables of about 30 lines

Link to comment
Share on other sites

so you need to preg_match the word Passord then find a rexwg for space followed by numbers letter hard to tell if there are underscores or other charachters hard to use $trim to get rid of the space :(

 

maybe just try to Curl the page your self and see what the format is?

 

can't you just view the html source of the page and preg_match that?

 

sorrry i couldn't have been more help but i gotta go get some zzzzzzzz

Link to comment
Share on other sites

<?php
$string = "This will retrieve the password \n Password: passworddd\n is that ok?";
preg_match("~Password: (.*+)~", $string, $password);
$password = $password[1];
echo $password;
?>

 

A few items. This will only work if the password: is on it's own line. If it is not it will not work. The . quantifier (if that is the term) matches to the end of a line. So if it is not on it's own line you need to do some other regex (maybe match to the first space).

 

Either hope that helps.

Link to comment
Share on other sites

You already made a thread about this. Please do not make multiple threads asking the same thing.

 

And btw, in case you failed to notice, you seem to be getting the same answers here as your other thread.  Which means the real problem is that you are either not implementing these solutions properly or else you are not properly explaining the situation.

 

If you decide to post more info about your situation, actual context, etc.. do it in the thread you started first.

 

Thread closed.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.