Jump to content

flounder

Members
  • Content Count

    22
  • Joined

  • Last visited

Community Reputation

0 Neutral

About flounder

  • Rank
    Member

Profile Information

  • Gender
    Male
  • Location
    Surrey BC Canada
  1. Hi Jacques1, thanks for your help. The strange hex sequences were written to my output file after processing my curl input with simple_html_dom. Replacing $html = new simple_html_dom(); $html->load($result); with $html = new simple_html_dom(); header('Content-Type: text/html; charset=utf-8'); $html->load(utf8_encode($result)); solved my problem. All options now have the right text. Thank you VERY much!
  2. Thanks for your response. The pages I scrape don't have a DTD or a declared character encoding. FireFox displays the pages OK in Quirks mode. My Code Editor identifies the encoding as windows-1252. So I created a page with a few of the problem characters in it: è, ö, ü and ý, saving it in windows-1252 encoding, attached. This works on my terminal: iconv -f WINDOWS-1252 -t UTF-8 input.html outputting è ü ý ö to my screen, but server-side: $file = fopen("input.html","r"); while(! feof($file)) {echo fgets($file);} $file = file('input.html'); foreach ($file as $line_num => $line) {echo $line;} echo file_get_contents('input.html'); All return � � � � As far as I can tell, all PHP file operations retrieve the contents of the file in ASCII, therefore $utf8 = iconv('windows-1252', 'utf-8', $input); fails. I don't think it can be done programatically server-side. Can anyone confirm this? input.html
  3. Hello all, With permission, I scraped a website using curl and simple_html_dom to retrieve 6342 links from 112 pages. While scraping, I converted the links to options for a select element. Most of the options display properly. Here's the problem: there are some ISO 8859-1 hexadecimal encoded characters in the HTML source files, which display as string literals inside options. $input = "<option>Cr\E8me</option>" $input = str_replace("\E8", "è", $input) does not work. How do I turn "<option>Cr\E8me</option>" into "<option>Crème</option>" Any suggestions? TIA.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.