Jump to content

Creating broad match keywords


eldan88

Recommended Posts

Hey guys. I am trying to set up a negative street keyword list for a an online ordering application I have.

 

The way I want it to work, is to enter a list of negative keywords, and have the system prevent them from placing and order if the keyword they entered has a broad match to the keyword I entered in the back-end. (Kind of the way google adwords has it set up)

 

I don't want the keywords to be matched exactly since the user could type in different keyword variations.

 

For example one of my negative keyword will be

 

500 49 st

 

so if the user will type in "500 49 street" then the system will prevent them from ordering online.

 

Any help would be high appreciated!

Link to comment
Share on other sites

well your specific example is pretty easy and straight forward. You can catch both with just using any generic substring function like stripos. Where it gets tricky is when you want for example "500 49 blvd" to equate to "500 49 boulevard". For that, you will need to create a lookup table with aliases. And then that's not even getting into whether or not "500 49 st" should match against "500 4 9 st" or "49 500 st".

 

In short, there's no perfect solution for what you want to do, but you can start with a lookup table for non-substring aliases (like the blvd example) and either loop through the aliases with stripos or implode them into an alternation expression in a preg_match pattern.

Link to comment
Share on other sites

An alternative.. you could force the user to pick your stored format and then accept/reject based on it. Basically what you would do is treat their input as a search and offer up suggestions that match your records and force them to pick one. Then if it's on your list of ones you don't want to accept, deny it. This may or may not be feasible for you if your setup does not involve having a full list of values.

Link to comment
Share on other sites

Preprogrammed formats have a habit of being very complex. Here in the netherlands we have you can live at the "2e van Zwinden straat 24 A III L" That's the second "van zwinden" street, number 24, first floor, third appartment on the left side of the hallway. Alternatively you can live at the front or the back of the building. So, you cannot disapprove a housenumber because it's written as "24A3V", because that really is their housenumber.

 

If you are going to filter, make sure you don't just discard suspect data because changes are that you have lots of false positives.

 

You should probably also look into the FULLTEXT and fuzzy-string features that the popular databases have, they allow you to do things like work out how closely the entered data matches a stopword, which may be significant when deciding wether it is a bad string.

Link to comment
Share on other sites

@.josh thanks for that info. I will def use preg_match for that. I guess I can also not use the words blvd or st. i can just add the street number's only along with building address, and if it comes up a match then i can prevent them from ordering.

Link to comment
Share on other sites

 


 I will def use preg_match for that.

 

Regexps are extremely usefull but if you are only looking for a substring then strpos() and the PHP's other string functions are usually much faster.

 

 


and if it comes up a match then i can prevent them from ordering.

 

You'll ask them to correct it. :-) Remember the goal is to accept as many orders as possible, the filtering is just to make things easier for the people who process the orders.

Link to comment
Share on other sites

An alternative.. you could force the user to pick your stored format and then accept/reject based on it. Basically what you would do is treat their input as a search and offer up suggestions that match your records and force them to pick one. Then if it's on your list of ones you don't want to accept, deny it. This may or may not be feasible for you if your setup does not involve having a full list of values.

 

This is actually a really good solution. I have been actually thinking about implementing this one.

 

My only question is how would I go about grouping the building numbers for a specific street.

 

For example on 1st street the building rage can be from 1-100. That means 1-100 is going to be assign to 1st street only.

 

If someone types in "2" it would auto populate "2 1st street".

 

 

Or another example lets say on 2nd street the building numbers start from 200-300. I would need program a way to have the street number show right after someone types their building number that falls within the range.

 

So if a user types in 220 it would auto populate to "220 2 Street"

 

Hope this makes sense :)

Link to comment
Share on other sites

 


 That means 1-100 is going to be assign to 1st street only.

 

I think what josh meant is you should make your users pick a format that they want to use to enter their address, like "<housenr> <streetnr>" or "<housenr> <boulevardname>" and give them separate fields to enter each bit of information into. Then you can assume that whatever is entered into the housenr field is a house number and you don't have to try to work out how it was written.

Here in Holland we do the same for your sometimes bizar adresses, we have a separate field for the provence, the city, the streetname, the housenumber, the additions to the housenumber and the zipcode. It's much easier for the user to split the information when entering it into a form, than for the script to split a combined string into it's parts.

Link to comment
Share on other sites

I was thinking more along the lines of treating it like how a normal search goes. User starts entering in something in a text field. Auto-complete/suggest would be a plus, since you can start looking for potential matches early on. Eventually the user either selects from auto-suggest or completes typing and presses the button. Then a list of results matching what they entered are displayed. The results are from your database of known addresses. Then they have to pick one of them. So it's kind of like a search in the regards that you enter in arbitrary stuff, but ultimately you have to click a defined result/link to move on.

 

vinny42's suggestion about offering separate fields for address parts is also good, though the important/actual concept here is this:

 

you are currently trying to do this:

 

get input from user >

make a best guess what that input is >

make a decision based on that guess

 

the suggestion here is that you change your flow to this:

 

get input from user >

make a best guess what that input is >

display matching results based on your guesses >

make user select one of them >

make decision based on what they selected

 

This is a fairly common thing to do with addresses. Many sites probably use an API that queries some 3rd party like google maps or postal service or something as the engine. Or if you already have a full list of all addresses you do (or do not) cater to, then keep it local. As I said initially, this may or may not even be an option to you, based on this part of the equation - how you actually keep track of addresses.

Link to comment
Share on other sites

 


his is a fairly common thing to do with addresses

 

True, but you have to make it pretty darn flexible. If the customer is used to entering his address a certain way and your autosuggest can't figure out what he means, he'll never get the right suggestion.

Link to comment
Share on other sites

True, but you have to make it pretty darn flexible. If the customer is used to entering his address a certain way and your autosuggest can't figure out what he means, he'll never get the right suggestion.

Well most people make a habit of providing their address in a format recognized by their post/mail person. If their "certain way" is not recognized by them, they will not get their mail and packages delivered, and they will learn real quick to make their "certain way" coincide with the expected way!

Link to comment
Share on other sites

 


Well most people make a habit of providing their address in a format recognized by their post/mail person.

 

Many do, but a significant number don't. They either don't care and assume that some human  will see the address and fix it, of they simply don't understand how the website wants the address to be supplied. And I understand that perfectly; every webdesigner has his own ideas about what's userfriendly and some just aren't.

 

I've had fights (well, allmost) with collegues about the dutch zipcode which oficially has no space in it: "1234AA" but many poeple do type a space: "1234 AA". Fixing it is simple; just strip everything that's not a number or a letter. Yet they wanted to give an error if the format was not "1234AA", meaning that half of the customers would get a very silly error about a space that is of no consequence anywhere, ever.

 

To make an increasingly long story short: if the autosuggest can't find it, it should tell the customer that the *format* is not recognised, not that the address cannot be found (because it is also very popssible that your address database doesn't yet know about some of the newsest addresses)

Link to comment
Share on other sites

I agree that the script should be smart enough to fix or recognize some inconsistencies in format. For example the mailman isn't going to bitch if you put "123 Park Ave" on your letter instead of "123 Park Avenue". And same with phone numbers.. rather than bitch about whether or not someone used dashes or dots or parens or spaces, I either make separate fields for the parts, or I simply strip out non-numbers and count.

 

But the point is, you can't account for all of the inconsistencies, nor should you have to. Taking liberties with being flexible about formatting is a double-edged sword. Trying to guess or assume what the address is will carry over to the person trying to deliver the package.

 

And my point is about holding them to the standard of the mail/postal service. It's easy enough to convey that to the user. "If you're mailman can't recognize what you provide, why the hell do you think we can - or should?" sort of thing. It's not about just being able to fill out a form.

 

It's about being able to do something with that information given. If you just wanted the user to be able to fill out the form no matter what, just give them a text field and accept whatever the hell they enter in. Then when you've lost enough time and money over trying to figure out where shit is supposed to go, or being unable to deliver it at all, maybe then you'll reconsider forcing the user to provide a recognized format.

 

The solution is to be more flexible in figuring out what they are trying to convey, yes. Provide search box with auto-complete. Split it up and search for each bit individually. Every single character if you really want to. Run it through a 3rd party service. Tell the user to go to a 3rd party service like google maps and find it, and provide a link from there and parse it. Do what you have to, to get a format that the delivery guy can understand and lookup. But the final step should always be them selecting a recognized format.

 

p.s. - I don't know what industry you work in or what experience you've had, but in my experience, the only people I know who just randomly write down addresses and hope for the best or expect it to be understood, are children. I've never really met or known an adult to just write down their address as they feel like it and expect someone to sort it. They most certainly care that it is correct, because most people do not like the prospect of having to deal with not receiving their stuff. It's a fucking nightmare. Especially since some delivery companies or drivers will go the "best guess" route, deliver your shit to some other address and call it a day and not give 2 shits about fixing the situation after the fact.

Link to comment
Share on other sites

Hey thanks for the advice guys. I am did more research and will try implementing google's API to have this up and running. If I will run into problems I am going to create the address's lists my self. If I run into any questions doing so, I will let you know. But thanks for your help!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.