Jump to content

Regex pattern with extra extension for urls


Go to solution Solved by gizmola,

Recommended Posts

Hello. I have the following code that works for it's intended purpose for all youtube links including shares (si) extensions. Can someone please tell me how to alter it to add the shorts?
This is in PHP and has to remain that way. I can't use javascript for this because this is part of a very long php coded page.

Shorts example
https://www.youtube.com/shorts/J0iIQ629N2c

My code that works for all but shorts

$regex_pattern = "/(youtube.com|youtu.be)\/(watch)?(\?v=)?(\S+)?/";


I have tried the following and it failed. I should note that I have been trying to understand regex patterns and my mind doesn't seem to want to learn it at a normal good rate but I keep trying

$regex_pattern = "/(youtube.com|youtu.be)\/(shorts|watch)?(\?v=)?(\S+)?/";

$regex_pattern = "/(youtube.com|youtu.be)\/(shorts)\/(watch)?(\?v=)?(\S+)?/";

$regex_pattern = "/(youtube.com|youtu.be)\/(watch)?(\?v=)?(\S+)?(\shorts)?/";

 

Why would you use Javascript for this?

It's okay to have the regex be multiple patterns. You don't, not necessarily, have to use a single capture group to get the one value you care about.

youtube.com/shorts/(\w+)|youtube.com/watch\?v=(\w+)|youtu.be/whatever else

Only one of $1 or $2 (or what you put in the "whatever else") will ever have a value.

And do remember that "." matches anything, so "youtubexcom/short/blah" will match the above too.

  • Like 1

@requinix Thank you for that. I tried the following but with no luck, based on what you said. Admittingly, I have to come to best guess solution for it because regex is the bane of my brain haha

 

$regex_pattern = "/(youtube.com/shorts|youtube.com|youtu.be)\/(watch)?(\?v=)?(\S+)?/";

$regex_pattern = "/youtube.com/shorts/(\w+)|youtube.com/watch\?v=(\w+)|youtu.be/(\S+)?/";

$regex_pattern = "/youtube.com/shorts/(\w+)|youtube.com/watch\?v=(\w+)|youtu.be/?si=(\w+)/";

$regex_pattern = "/youtube.com/shorts/(\w+)|youtube.com/watch\?v=(\w+)|youtu.be/?(\S+)/";

I should also note that the following is my original one that works for all but the shorts but I tried to use it with variations of what you said and no luck...

$regex_pattern = "/(youtube.com|youtu.be)\/(watch)?(\?v=)?(\S+)?/";

Edit: No, I don't want to use javascript. I just added that in the original post because others I have asked keep telling me to use javascript instead haha

Edited by PNewCode

@requinix Here's the whole thing. It may not make sense (maybe?) because there's a lot of other stuff on the page (very long) that goes with everything involved for the user. But basically... the user enters a youtube link in a form.
Then that form is sent to the (link-insert.php) which sends that link along with the title of the youtube link to the database (example: The link for the music video Korn - Life is peachy will send that title to the database)
What I have originally works for
shared links (youtu.be?si=)
and regular links
(youtube.com/watch?v=) and also songs from playlists.
However if someone sends a link that is a short (youtube.com/shorts/VIDEO-ID-HERE) sends back a blank entry to the database as the title because it can't translate that link extension

Note: $link is the field namd in the form
$band is the column in the database

The following also successfully allows the thumbnail to show on the page that calls the info from the database

The shorts is the only thing that will send a blank entry to $band into the database ($band is the title of the video)

You'll notice the last part, is the part I'm having the trouble with

EDIT: Also, if it's not a youtube link at all, then "Not a youtube request" enters in $band in the database so that it's not blank


$ytvideo1 = $link;

$linkurl = "$ytvideo1";
parse_str( parse_url( $linkurl, PHP_URL_QUERY ), $vid );
preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $linkurl, $match);
$youtube_id = $match[1];

$preurl = "https://www.youtube.com/watch?v=$match[1]";

$ch = curl_init();


curl_setopt($ch, CURLOPT_URL, $preurl);
        
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$output = curl_exec($ch);

$document = htmlspecialchars($output);
curl_close($ch);     
        
$line = explode("\n", $document);
$judul = "";
foreach($line as $strline){

      preg_match('/\<title\>(.*?)\<\/title\>/s', $strline, $hasil);
      if (!isset($hasil[0]) || $hasil[0] == "") continue;
            $title =  str_replace(array("<title>", "</title>"), "", $hasil[0]);

}



$validateurl = $link;
$regex_pattern = "/(youtube.com|youtu.be)\/(watch)?(\?v=)?(\S+)?/";
$match;


if(!preg_match($regex_pattern, $validateurl, $match)){
    $band = "Not A Youtube Request";
}else{
    $band = $title;;
}

 

Edited by PNewCode
  • Solution

Seems pretty cut and dry that you just need to add an OR to optionally match the "shorts/".  I don't know if the rest of the code will also return the data you are looking to scrape or not.

 

preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:shorts/)?|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $linkurl, $match);

 

@gizmola You nailed it! And to @requinix I just realized from Gizmola's reply that I originally gave the wrong part of my code. I apologize for that. I thought the line that I provided was where my issue was. Apparently not.

So for my education, if you don't mind...
The reason Gizmola's worked is because it's the first part of a 3 section translation? First being domain, second being the (watch, si, etc) and then the last being the ID?
Thats what it looks like to me now. I couldn't see that before. And looks as though they are separated by the " | " character?

That really helped me a lot to better understand this. Thank you. 
Am I correct in my understanding of how it works?

The | is just an OR.  (This thing)|(that thing).

There are 2 great regex testing sites you should try.  They can really help you experiment and understand how regex works.

First there is https://regex101.com/ 

2nd is: https://regexr.com/

They both have resources and a testing interface that is really useful.

I have loaded the regex I provided with some tests into regexr here:  https://regexr.com/7tc1q

One thing to keep in mind  is that the testing tools don't allow you to change the delimiter from the default of /.  

You can continue to use the slash delimiter without issue, so long as you escape any slashes:  \/

Note that you do not need to escape slashes inside a character class ie.  [ ."/ ]

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.