Jump to content

Recommended Posts

Hello, I'm trying to index links on a website using a php script of mine however I'm completely lost as how to do this. I've never used Regex really before so it's not even like I know where to start really.

 

The whole source of the website is already in a string, and I know the all of the links inside the source look like this:

<a href="lyrics.php?id=4809">Artist Name</a>

 

The number after ?id= is always going to be random, as well as the "Artist Name".

<a href="lyrics.php?id=*">*</a>

Above: * = Random/Unknown

 

The page has hundreds of links like this, and I'm trying to get them into an array to process into a database.

 

Even if we're able to figure out how to extract all of these from the string, how am I able to keep the Id and Artist Name together so I can add them to a database?

Link to comment
https://forums.phpfreaks.com/topic/156234-im-new-to-this-need-help/
Share on other sites

Untested...

 

$source = __something__;

preg_match_all('/<a href="lyrics\.php\?id=([\d]+)">([\w\s"\'-]+)<\/a>/i', $source, $link_matches);

foreach ($link_matches as $key => $match)
{
    $links[] = array(
        'id' => $match[1],
        'title' => $match[2],
    );
}

 

By the way that will find link's with obscure names like.. "Artist's"_-_Name - though I doubt there is any!

In the first capture, you don't really need to surround the \d in a character class, as \d is actually a short hand character class...

That second capture could be simplified by using [^<]+ (as names don't have < in them, this will capture pretty much anything in between > and <).

 

Also untested...

preg_match_all('/<a href="lyrics\.php\?id=(\d+)">([^<]+)<\/a>/i', $source, $link_matches);

I know this may seem a bit noob, but again I've never really dealt with arrays, more specifically an array like this.

 

How would I access the "$links" part of the array? I'm guessing you would have to do something like:

foreach ($links as $xxx => $xxxx) {
    xxxxx;
    xxxxx;

    checkIfExists($id, $title);
}

 

checkIfExists() is going to see if the id and title already exist in the database and if it doesn't it will add it.. I've already got the function checkIfExists() finished though.. I'm stuck at getting the appropriate id and title to the function though.

 

Any help? :(

How would I access the "$links" part of the array? I'm guessing you would have to do something like:

foreach ($links as $xxx => $xxxx) {
    xxxxx;
    xxxxx;

    checkIfExists($id, $title);
}

 

Inside the foreach loop, simply echo $xxxx?

 

Example:

$arr = array(1,2,3);
foreach($arr as $val){
echo $val . "<br />\n"; // this will echo out the values (1 2 and 3) respectively...
}

 

If you want to display the keys as well as the values, you can do this:

foreach($arr as $key => $val){
echo $key . ' ' . $val . "<br />\n"; // this will echo the keys and their values....(0 1, 1 2, 2 3)
}

I'm trying to use the following to test around before I add it into my script and I'm getting this output:

0 Array

1 Array

2 Array

 

Code:

<?php
$source = '<a href="lyrics.php?id=4809">Creed</a><a href="lyrics.php?id=2511">Tupac</a>';

//preg_match_all('/<a href="lyrics\.php\?id=([\d]+)">([\w\s"\'-]+)<\/a>/i', $source, $link_matches);
preg_match_all('/<a href="lyrics\.php\?id=(\d+)">([^<]+)<\/a>/i', $source, $link_matches);

foreach ($link_matches as $key => $match) {
    $links[] = array(
        'id' => $match[1],
        'title' => $match[2],
    );
}

foreach($links as $key => $val){
echo $key . ' ' . $val . "<br />\n";
}
?>

 

I'm completely lost.. I'm just trying to echo out the array in this format  "ID - Title"

Array element [0] is what the complete pattern stores from preg_match_all gets stored into... each capture (the parts in the pattern that are in parenthesis) is stored into [1],[2] etc.. so if you want to pair the captures together, this is one way you can do it:

 

$source = '<a href="lyrics.php?id=4809">Creed</a><a href="lyrics.php?id=2511">Tupac</a>';
preg_match_all('/<a href="lyrics\.php\?id=([\d]+)">([^<]+)<\/a>/i', $source, $link_matches);
$link_matches = array_combine($link_matches[1], $link_matches[2]);
foreach($link_matches as $key => $val){
$links[] = array('id'=>$key, 'title'=>$val);
}

 

Basically, I take the array $link_matches and merge it with itself (only merging element 1 and 2) This does two things.. it removes the 0 element completely (which contains the complete pattern matching), and in essence makes it a one dimensional array by making the values of element 1 as keys, and the values of element 2 as those new keys values (hope I was clear in explaining that). You can see for yourself what this new version of $link_matches array by doing:

echo "<pre>".print_r($link_matches, true);

 

But the snippet above then delves into this array and puts each key and value into id and title respectively.. so if you wanted to output the first id, you would simply do:

echo $links[0]['id'];

 

Does this make things clearer?

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.