Darghon Posted May 18, 2009 Share Posted May 18, 2009 Hello all, I'm trying to split a html element into arguments example: <img src="Style/Images/Sprites/Civilian/base-male.png" height="75px" alt="Civilian Img" /> into: Array("tagName" : "img" , "src" : "Style/Imag...png", "height" : "75px", "alt" : "Civilian Img"); but I haven't got a clue how to do it. any tips/tricks on how to pull this off? Quote Link to comment Share on other sites More sharing options...
Adam Posted May 18, 2009 Share Posted May 18, 2009 I'd probably split the HTML by white space, then run a few regular expressions on each part to determine if it's the opening tag, an attribute, the end, etc. Probs need to watch out for things like "checked" - 1 word arguements. Best place to start is to work out what's distinct for each part, how you can recognize them. For example the start tag will have a '<' on the left, attributes will have the equals: 'something=something'.. etc. etc. More information on regular expressions Using regular expressions with JavaScript Good luck! Glad to help further if you run into troubles. Quote Link to comment Share on other sites More sharing options...
Ken2k7 Posted May 18, 2009 Share Posted May 18, 2009 Well split by spaces isn't the best idea. Your alt case will be messed up. I would just use Regex on the whole thing. Quote Link to comment Share on other sites More sharing options...
Adam Posted May 18, 2009 Share Posted May 18, 2009 Ah yeah you're right! Wouldn't work if some of the HTML is a little messed up as well. Yup, Regex all the way! Quote Link to comment Share on other sites More sharing options...
Darghon Posted May 18, 2009 Author Share Posted May 18, 2009 well what I tried in the mean while was a loop that checks every character so each time it passes a space, it dumps its buffer into an array, if it passes a quote, it ignores spaces till it finds another quote, and that all works incredibly well but, the result of the entire thing is, elements are converted correctly and reassembled to Dom objects as they should, but I'd like to strip all the new lines and tabs from the tags, but doing something like =>if(tag.substr(i,1) == "\n" || tag.substr(i,1) == "\t" || tag.substr(i,1) == "") but with no result, here is my splitTag function: function splitTag(tag){ var list = new Array(); var buffer = ""; var ignoreSpace = false; var ignoreNextChar = false; var openedQuote = false; for(var i = 0;i < tag.length;i++){ if(tag.substr(i,1) == "\n" || tag.substr(i,1) == "\t" || tag.substr(i,1) == ""){ //I do nothin' } else{ if(!ignoreSpace && tag.substr(i,1) == " "){ if(buffer.length > 0){ list[list.length] = buffer; buffer = ""; } } else{ if(ignoreSpace && tag.substr(i,1) == " "){ buffer += tag.substr(i,1); } else if(ignoreNextChar){ buffer += tag.substr(i,1); ignoreNextChar = false; } else if(tag.substr(i,1) == "'" || tag.substr(i,1) == "\""){ if(openedQuote == ""){ openedQuote = tag.substr(i,1); ignoreSpace = true; } else{ if(openedQuote == tag.substr(i,1)){ openedQuote = ""; ignoreSpace = false; } else{ buffer += tag.substr(i,1); } } } else if(tag.substr(i,1) == "\\"){ ignoreNextChar = true; buffer += tag.substr(i,1); } else{ buffer += tag.substr(i,1); } } } } if(buffer.length > 0){ list[list.length] = buffer; buffer = ""; } return list; } now how can I pull this last thing off, because right now, if I put a newline in my source code, the resulting layout drops a line as well... and I don't want that thx in advance Quote Link to comment Share on other sites More sharing options...
Ken2k7 Posted May 18, 2009 Share Posted May 18, 2009 Wow, I won't even begin to parse all that. Try somewhere along the lines of: var str = '<img src="Style/Images/Sprites/Civilian/base-male.png" height="75px" alt="Civilian Img" />'; var matches = str.match(/([a-z]+?=\"[^\"]+\")/ig); Quote Link to comment Share on other sites More sharing options...
Darghon Posted May 19, 2009 Author Share Posted May 19, 2009 Thx for the code, it seems to work nicely, only it always skips the first tag, (like img in the example) but all I still need right now, is a way to check that all special chars are filtered out, like linebreaks in the source code for example, if my source has => <div id='div1'> <div id='div2'>stuff here</div> </div> I will get a blank line before "stuff here" if I have the source => <div id='div1'><div id='div2'>stuff here</div></div> then I don't have a blank line any solutions for this problem? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.