Jump to content


Photo

Parsing a file


  • Please log in to reply
7 replies to this topic

#1 AnthonyB

AnthonyB
  • New Members
  • Pip
  • Newbie
  • 5 posts

Posted 13 April 2006 - 11:26 AM

Hi there,

I would like to be able to parse a .m3u playlist file, in order to extract the title, artist and song time. Fortunately I was able to find some source that has pretty much done the trick, however there is one issue with it that I need help with.

This link takes you to where my php code is, because when I place me code in here, I get a forbidden message when I submit the post...

[a href=\"http://pastebin.com/657430\" target=\"_blank\"]http://pastebin.com/657430[/a]

An example of an output using, for example, an Avril Lavigne album is;

However, if we come across an artist/band with a '-' in their name, the parsing code interprets the first hyphen it comes across as the seperator for artist and track title.

An example of a working parse, followed by and incorrect one can be seen here:
[a href=\"http://img334.imageshack.us/my.php?image=image8cm.jpg\" target=\"_blank\"][img src=\"http://img334.imageshack.us/img334/3470/image8cm.th.jpg\" border=\"0\" alt=\"IPB Image\" /][/a]

Does anyone know how to tweak the code so that it can cater for hyphens in the artist/band name?

Thanks in advance! :D

#2 Honoré

Honoré
  • Members
  • PipPipPip
  • Advanced Member
  • 66 posts
  • LocationAntwerp - Belgium

Posted 13 April 2006 - 12:10 PM

try with the following at line 43:
$artist = strtok(" - ");
This is trying to use space hyphen space as token.

#3 lead2gold

lead2gold
  • Members
  • PipPipPip
  • Advanced Member
  • 164 posts
  • LocationOttawa, On

Posted 13 April 2006 - 12:13 PM

Thats a tough one. There is no delimintors you can really use in a .m3u file that you could filter from.
Would it be easier to just parse the mp3 for an ID3v1 or ID3v2 Tag?

ID3v1 tags are the easiest... they are at the head of the file, so after a call to fopen() you'll only have to read the next X bytes and then fclose(). You wouldn't cause that much processing time (delay time).
The first 4 bytes tell you if an ID3v1 tag is even present, so you don't have to read further if it isnt.

I wrote some code (in C++) a while ago for an mp3 tag program i use at home. It reads through all my tags and generates consistent filenames making it easier for me to find stuff.
I'm at work now, but I could get the details on parsing the ID3v1 information and post it here if you like.
If you use the tag, you'll be able to get more information (rather then just the artist, song and length).

Now if your parsing this stuff because people are uploading the m3u files to your remote site, then yea... you won't want to be sending an entire mp3 just to parse the first 80 bytes or so....

Anyways, thats just my input...

If you want to continue parsing M3U's you'll have to do it this way
1) parse for the token: #EXTINF:
2) from here right to left (extract the time), stop parsing when is_numeric() stops returning true;
3) continue parsing until you find the '-' character. Artist names don't normally have a '-' in there name so
the data found from the end of the time field to here (passed into the trim() function is your artist.
4) parse over the '-' and the rest of the information until '\n' is your song title.

If there is no #EXTINF: then your at the mercy of the filename on your hardrive.
Hope that helps!

#4 AnthonyB

AnthonyB
  • New Members
  • Pip
  • Newbie
  • 5 posts

Posted 13 April 2006 - 12:30 PM

Thanks for the responses;

[!--quoteo(post=364363:date=Apr 13 2006, 01:10 PM:name=Honoré)--][div class=\'quotetop\']QUOTE(Honoré @ Apr 13 2006, 01:10 PM) View Post[/div][div class=\'quotemain\'][!--quotec--]
try with the following at line 43:
$artist = strtok(" - ");
This is trying to use space hyphen space as token.
[/quote]

Unfortunately that doesn't seem to do the trick. For some reason it isnt taking " - " as a token, but instead it is just basically stopping after a blank space. So, before it would be American Hi-Fi parses to

Artist                     Title
---------------------------------
American Hi           Fi - Surround

But now it does;

Artist                     Title
---------------------------------
American             Hi-Fi - Surround

Which is quite weird lol. I don't know why it does that. Is there an escaping character I can use to replace the white spaces?


[!--quoteo(post=364366:date=Apr 13 2006, 01:13 PM:name=lead2gold)--][div class=\'quotetop\']QUOTE(lead2gold @ Apr 13 2006, 01:13 PM) View Post[/div][div class=\'quotemain\'][!--quotec--]
Thats a tough one. There is no delimintors you can really use in a .m3u file that you could filter from.
Would it be easier to just parse the mp3 for an ID3v1 or ID3v2 Tag?

ID3v1 tags are the easiest... they are at the head of the file, so after a call to fopen() you'll only have to read the next X bytes and then fclose(). You wouldn't cause that much processing time (delay time).
The first 4 bytes tell you if an ID3v1 tag is even present, so you don't have to read further if it isnt.

I wrote some code (in C++) a while ago for an mp3 tag program i use at home. It reads through all my tags and generates consistent filenames making it easier for me to find stuff.
I'm at work now, but I could get the details on parsing the ID3v1 information and post it here if you like.
If you use the tag, you'll be able to get more information (rather then just the artist, song and length).

Now if your parsing this stuff because people are uploading the m3u files to your remote site, then yea... you won't want to be sending an entire mp3 just to parse the first 80 bytes or so....

Anyways, thats just my input...

If you want to continue parsing M3U's you'll have to do it this way
1) parse for the token: #EXTINF:
2) from here right to left (extract the time), stop parsing when is_numeric() stops returning true;
3) continue parsing until you find the '-' character. Artist names don't normally have a '-' in there name so
the data found from the end of the time field to here (passed into the trim() function is your artist.
4) parse over the '-' and the rest of the information until '\n' is your song title.

If there is no #EXTINF: then your at the mercy of the filename on your hardrive.
Hope that helps!
[/quote]

Thanks for your response too.. but what I had in mind was for users to basically upload an m3u file so that I can just use that to extract all songs within each album. They are a lot smaller, and one file contains all the info i need really. ID3 tags would be easier.. but like you said, it would mean going through each individual mp3 file.. which isn't what I want lol.

If only this was easier lol! :P

#5 AnthonyB

AnthonyB
  • New Members
  • Pip
  • Newbie
  • 5 posts

Posted 13 April 2006 - 12:56 PM

I found that what goes in the token area in the strtok function are all options. So, the

$artist = strtok(" - ");

Actually means that the token can be a blank space OR a hyphen (i assume the last blank space is void really).
Does someone know how I could make the token the string " - ", including both the first and second white space?

Thanks again

#6 lead2gold

lead2gold
  • Members
  • PipPipPip
  • Advanced Member
  • 164 posts
  • LocationOttawa, On

Posted 13 April 2006 - 01:02 PM

[!--quoteo(post=364378:date=Apr 13 2006, 08:56 AM:name=AnthonyB)--][div class=\'quotetop\']QUOTE(AnthonyB @ Apr 13 2006, 08:56 AM) View Post[/div][div class=\'quotemain\'][!--quotec--]
I found that what goes in the token area in the strtok function are all options. So, the

$artist = strtok(" - ");

Actually means that the token can be a blank space OR a hyphen (i assume the last blank space is void really).
Does someone know how I could make the token the string " - ", including both the first and second white space?

Thanks again
[/quote]

I think preg_replace() will give you that affect

Edit:
something like:
$song = preg_replace("/ - *$/","",$string);
$artist = preg_replace("/*#EXTINF:[\d]* - /","",$string);

I'm terrible with my regular expressions.. but maybe a guru could look at that and adjust them accordingly to what you want.

#7 AnthonyB

AnthonyB
  • New Members
  • Pip
  • Newbie
  • 5 posts

Posted 13 April 2006 - 01:12 PM

[!--quoteo(post=364382:date=Apr 13 2006, 02:02 PM:name=lead2gold)--][div class=\'quotetop\']QUOTE(lead2gold @ Apr 13 2006, 02:02 PM) View Post[/div][div class=\'quotemain\'][!--quotec--]
I think preg_replace() will give you that affect

Edit:
something like:
$song = preg_replace("/ - *$/","",$string);
$artist = preg_replace("/*#EXTINF:[\d]* - /","",$string);

I'm terrible with my regular expressions.. but maybe a guru could look at that and adjust them accordingly to what you want.
[/quote]

Thanks for that. What I will try and do then, is search for the " - " and replace it with like the "@" character, and then use this as a token.

Looking at php.net for that function... woah, its complicated lol!!

If anyone wants to lend a hand i'll be much obliged... otherwise its gonna be while to understand how to use that function lol

Thanks again :D

EDIT: Just saw your edit lol.. i'll plug that in and see what I get.. otherwise all of that stuff looks like another language to me :P . I did have some experience with regular expressions in uni.. but my memory is failing me (and that was only last year!)

#8 AnthonyB

AnthonyB
  • New Members
  • Pip
  • Newbie
  • 5 posts

Posted 13 April 2006 - 01:50 PM

Wicked!! I just sorted it out :-)


<?php
//Remove initial time values and comma
    $new_string = preg_replace("/(\w+),/","",$buffer);
//Replace the "  - " string with the @ character
    $new_string = preg_replace("/ - /","@",$new_string);

$artist = strtok($new_string, "@");
?>

This seems to have done the trick! Granted, it only works if there is a hyphen in the artist name that doesnt have any white space around it.. but its a start :D

Thanks for your help guys!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users