Jump to content

ManiacDan

Staff Alumni
  • Posts

    2,604
  • Joined

  • Last visited

  • Days Won

    10

ManiacDan last won the day on August 12 2014

ManiacDan had the most liked content!

About ManiacDan

  • Birthday 01/07/1984

Profile Information

  • Gender
    Male
  • Location
    Philadelphia PA

ManiacDan's Achievements

Regular Member

Regular Member (3/5)

115

Reputation

  1. Building a database on your end with constantly-running spiders is still the proper solution. If you're using this for competitive intel so you can set your own pricing that means you're likely doing LOTS of searches against all these sites, and you'll likely be pulling down most of their product catalog anyway. Therefore, my initial advice still applies. If a database isn't an option for reasons you cannot share, then you still need to write multiple scripts, one for each site you wish to search, and have those scripts take in arguments on what to search for. These scripts will have to manually query the page, manually retrieve the search results, and manually look for the pricing. There are no magic functions here, and based on my experience doing exactly this same thing the sites will either block you outright or start screwing with your spiders. You will need constant vigilance on your data with extremely flexible scraping activity in order to maintain this project. You also will likely need a lawyer.
  2. If you're producing a fatal parse error, the error_reporting() function won't work because the PHP isn't even being run. You'll need to enable error_reporting in php.ini for that. The rest of the fatal errors should be displayed with the error_reporting() function turning errors on.
  3. I know they're not your sites, that's why you need to first make a copy of them yourself. If you don't want to do that, and you're ok with each request taking a few minutes to return (rather than a few seconds like a normal website would), then you still need to write one script for each of these pages which will perform the search (use the PHP class Snoopy for this), parse the results (using either regular expressions or the DOMDocument extension), and output them to a CSV (using fputcsv).
  4. What do you mean "fill it automatically in an option value"? Are you using that array to BUILD a drop-down list and you wish to pre-select a specific one for the user? If so, you want to put selected="selected" inside the <option> tag right next to the value. If you mean something else, please elaborate.
  5. To clarify a bit: print() accepts arguments without parentheses, but does not accept multiple arguments. The function definition for print is: int print ( string $arg ) The function definition for echo is: void echo ( string $arg1 [, string $... ] ) Print and echo are both language constructs, but they perform differently. Echo can have parentheses (but only if you're using one argument), whereas print has optional parentheses (it works with or without them). Echo can accept comma-delimited arguments whereas print cannot. Both functions accept concatenated objects, but for (negligible) speed reasons you should use commas when using echo. See my cli PHP example: php > $a = "Hello, "; php > $b = "World!\n"; php > php > echo($a); Hello, php > echo($a,$b); PHP Parse error: syntax error, unexpected ',' in php shell code on line 1 php > echo $a, $b; Hello, World! php > php > print $a; Hello, php > print $a, $b; PHP Parse error: syntax error, unexpected ',' in php shell code on line 1 php > print( $a ); Hello, php > print( $a, $b ); PHP Parse error: syntax error, unexpected ',' in php shell code on line 1 php > php > php > echo $a . $b; Hello, World! php > echo( $a . $b); Hello, World! php > print $a . $b; Hello, World! php > print( $a . $b ); Hello, World! php >
  6. "The white screen of death" indicates you have error-reporting turned off. Step 1 is developing with errors on. Set error_reporting in PHP.ini or use the php function error_reporting(E_ALL);
  7. I made a site like this at one of my first jobs. You should not be doing these queries in real-time. You need to write a spider for each of the sites you want to search, and have it constantly running and indexing each of these sites. Each spider will be custom-tuned to the site in question to properly pull item titles, product codes, descriptions, prices, and images (or whatever set of data you need). Cache all this information locally in a database along with a raw HTML copy of the page you got it from and links out to the original URL where the information was found. Then you need to work on fine-tuning your search database and algorithms locally to produce acceptable search speeds and results. Once people find what they're looking for on your site, they can click over to the indexed site directly (much like an actual search engine works). There is no drop-and-go solution to this, it's a multi-step process of spidering, locally caching, indexing, search optimization, and results presentation.
  8. With any "preventing kids from getting porn" program, you're pitting half a dozen (usually disgruntled) programmers against the sex drive and technological acumen of the 13 year old boys of the world. The programmers will always lose. The block my parents put on our family computer included a 15 second timeout for every wrong password attempt. During that timeout the program refused to respond to anything, you couldn't escape or interrupt the countdown no matter what you did. However, that meant if you hammered enough keys, Windows would assume the program was unresponsive and task manager would gleefully kill it every time. Reboot before they get home and it came right back. If your son has access to the machine in question, it's likely he'll figure out a way around the software on that machine, even if it only lasts 5 minutes. An external machine that he has no access to (aside from SSH which will be locked down tight...right?) is the only really secure solution. A keylogger virus that you pay to install is still a keylogger virus. It will store your bank account passwords and emails somewhere, likely somewhere your kid will be able to find (unless the logs are stored encrypted of course). It will also force your kid to do something drastic, like an USB Boot Drive with Ubuntu on it, which takes 5 minutes to boot and gives you unfettered access to all the wonders of the internet while your expensive virus sits dormant on the hard drive. Not to say that managing a linux-based firewall and proxy is super duper easy and we can all do it in 10 seconds, but your son will try for MONTHS to get around whatever you do. Whatever you do should probably take YOU more than 15 minutes.
  9. Agreed, you need to wrap these in an API. You'll be posting a document (usually XML or JSON) from one server to another, and the recipient will validate the object, extract the information, process it, and respond with another XML document, potentially also kicking off its own remote calls. It's complex, but then again you're attempting to make 3 web servers work in concert. Jacques' standard rant about security also applies if these scripts will actually be exposed to the internet at large.
  10. That format could bite you in the future if you do multiple nested files. The best way to do it is by utilizing the $_SERVER['DOCUMENT ROOT'] variable to start from your document root, and include based on the relation to THAT folder. What if you have that line you posted inside yourSite/admin/users/bulkDelete.php ? You'd include ../db_connect.php, and it would fail because yourSite/admin/db_connect.php doesn't exist. The include path is always the "working directory" of the script you accessed, which means it's the directory of the file being called by the URL. By relying solely on relative paths without making sure you're at the proper depth, you're not future-proof.
  11. I edit my own hosts file to prevent myself from muscle-memory visiting sites that take too much of my time (like reddit and until recently phpfreaks), it's effective unless the kid knows how to edit host files.
  12. Do you have a Linksys router? DDWRT can be flashed onto your router and there are mods out there for doing basic logging and whatnot. You can also proxy all the computers in the house (or even the internet connection itself) through a server you control and log everything that way. It's a bit much, obviously, but using network hardware is the best way to keep tabs.
  13. This is a fine use of functions. The best part about functionalizing code like this early in a project is so that you can edit how "access denied" works in one place instead of 300 when you change the behavior next year. Right now it's only 2 lines, but it could also do security logging, IP checking, karma calculations, and more. Now that you've put it in one place, you can make those decisions later, confident that they'll be centralized.
  14. The apache man page shows the variables reversed: <VirtualHost *:80> ServerName www.domain.tld ServerAlias domain.tld *.domain.tld DocumentRoot /www/domain </VirtualHost>
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.