Jump to content

Recommended Posts

In order to solve my problem I have been told to use multi threading, In my research I have found parallel because thread is not recommended for web server environments, What is CLI? 

php.net has the info but I may just be a little stupid, When looking for more information, I found something called compose, which I may need to install, but I am using shared hosting so it may not be possible. I have found that many hosts are unwilling to do anything beyond the basics. 

1. Can parallel be used without installing these added dependencies or packages? 
2. Is there a simple to understand explanation somewhere of this? I have Googled but am just getting more and more confused. 

 

Link to comment
https://forums.phpfreaks.com/topic/312146-threading-parallel-how-to/
Share on other sites

44 minutes ago, guymclarenza said:

Running processes in parallel so as to speed up the script

That may or may not be multithreading, depending on what you're trying to explain.

But anyway, I was more interested in why someone said that you needed "multithreading". What is being slow and why is "multithreading" supposed to help?

My crawler runs for up to 30 minutes to return 40 - 50 results on my dev machine, the moment I try and run it on the webserver it times out in a few minutes.  I am looking for a way to reduce the time to less than 3 minutes. so instead of doing everything in a queue, break up queue and run concurrent queues. 

Instead of crawling one page at a time, crawling multiple pages simultaneously. 

A little knowledge is dangerous. I could see that the script I created from the tutorial was not much good, I have been trying to solve some problems, broke the script a few times, got it to do what I wanted to but it seems that it's not very efficient. My goal now is to learn how to make it faster. I have even been looking at Python to see if that may not be a better way forward.  

All this confusion caused by a build a search engine like Google tutorial  on Udemy where I found the flaws and looking for solutions to those. 

The deeper I dig the more confused I get which is why I am looking for advice on finding good tutorials so that I can skip doing the shitty ones.

The bloke who runs said Udemy course said I should look at muti threading, I suspect it's a case of the blind leading the blind. 



 

Edited by guymclarenza
added line

Yeah, no, multithreading isn't the answer. Concurrency is. Meaning you have this script that can do the crawling, and you run the script multiple times in parallel.

But first, 30 minutes to get 40-50 results is absurd. Have you looked into why it's taking that long? It's ridiculous.

The problems as I see them are as follows.

It crawls a page, gets links, then has to discard duplicates, I think the hold up is there.  I am removing duplicates after fixing the url, maybe it would be better to strip out all duplicates before "fixing" the url.  To get 50 results, it is crawling and doing the whole process on 50 pages 

Does this make sense.

follow links.
add links to array
remove duplicates
fix links
echo links
repeat

at present the logic is 

follow links
fix links
remove duplicates
echo links
repeat

 



 

Scraping a page should take maybe one second. Dealing with the database, a fraction of a second. All in all 40-50 pages should be, like, a minute. I can't believe that dealing with duplicates takes up 29 more.

What's your code?

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.