Jump to content

Using RegEx to pull names off of a page


adrek

Recommended Posts

Hi,

 

I have a page that has a name on it and all associated email addresses for this name. I am making a javascript that can pull the name and email address from the page to make a contact list from it however these pages are not formated in any particular way. The email addresses are easy to pull but the names are not.

 

A typical page looks something like this

 

Hi, my name is [name]

 

 

 

The only problem is the name line can look like

 

Hi, im [name]

 

or

 

Name: [name]

 

The name is not necessarly on the first line of the page either.

 

is there a way i can use regular expressions to get pull the name from these pages?

 

Link to comment
Share on other sites

If there is no rhyme or reason to how the names are displayed or where they are displayed I don't see any solution for you.

 

What you are asking for is something that humans can do easily, but is very difficult to program and almost impossible to get 100% correct.

 

Possible solutiosn would include creating an expansive list of possible pre-fixes to the name (e.g. "Hi, im") or creating an even larger list of common names to search against.

Link to comment
Share on other sites

Oops, yeah I glossed over that, although javascript has regex as well.  You can setup a regexp object and then you call string.match().  There's a bit more to it, but it still comes down to a simple regex should work fine.

 

yeah, was just nitpicking :P you don't even need to setup a js regexp object if it's a straight pattern, only if you want to throw a variable into the mix as part of the pattern.

Link to comment
Share on other sites

But that does bring up the point of...why are you scraping this page with javascript? There are 100 better ways to get that data.

 

Good question, I don't own the webpage that I will be scrapping this data off of. It is a greasemonkey script that will pull the data for me so I can make a contact List from it.

 

Yes, a regex would work great.  Do you know about the "or" operator?  For example  (Hi, im |Name: )(.*)

 

I did not know about that I will defiantly give that a try and let you guys know.

 

What you are asking for is something that humans can do easily, but is very difficult to program and almost impossible to get 100% correct.

Possible solutiosn would include creating an expansive list of possible pre-fixes to the name (e.g. "Hi, im") or creating an even larger list of common names to search against.

.

 

This is a possibility. It doesn't need to be 100% accurate. Just accurate enough to make it worth while.

 

Thanks for all of the replies!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.