Jump to content

cbreemer

Members
  • Posts

    13
  • Joined

  • Last visited

cbreemer's Achievements

Member

Member (2/5)

0

Reputation

2

Community Answers

  1. Yes, I guess I'll need to keep looking. Thanks anyway.
  2. I don't of course. I was just wondering about you suggesting that pdf is "not a proper data format". A large part of the data I download (manuals, invoices, digital letters, even sheet music) is in pdf format.
  3. Thanks for your reply. I use PHP as the back-end for a web application I wrote to organize all my documents (which are almost exclusively pdf). In order to add search functionality I was looking into ways to convert pdf to text. So far I tried these PHP solutions: pdfcrowd API ( https://pdfcrowd.com/api/pdf-to-text-php/ ) Looks reliable but is very slow (cloud based) and either severely request-limited or quite expensive pdftotext.phpclass ( https://github.com/christian-vigh-phpclasses/PdfToText ) Looks ok but I saw one text output where each character was on a line by its own. Smalot Pdf Parser ( https://github.com/smalot/pdfparser ) Looks ok-ish but buggy. Lots of exceptions and errors, and 178 open issues, some recignized bugs since years pdftotext executable ( https://www.xpdfreader.com/download.html ) via shell_exec() Clunky and slow of course, and not always output. I could improve by not using shell_exec() but might still be too slow to search a large number of pdf's. I noticed inconsistent output between the four. Apparently there are different ways to parse a pdf into text. I'm starting to get the feeling that whatever I pick will come with its own set of errors and quirks. Just curious - you don't consider PDF a proper data format ?
  4. Hey folks, Just wondering if anyone here has experience with extracting text from PDF files in PHP, and is willing to exchange some experiences ? I tried a couple of different options but they all have their own quirks, and no two seem to produce the same output. Cheers, Chris
  5. Yes I thought so too. I was just ruling out the distant possibility there was something anomalous in the path name. Indeed I am going off mime type detection, seeing how easily it can trip up. It's not a big deal, they are all my own files and there are only a handful of file types to deal with. So checking the extensions will be fine, as you suggest. I'm already implementing it. Thanks for your help !
  6. Thanks ! That is much like what I had to start with, except that only I use scandir instead of glob (bit I'm pretty sure it does not matter how exactly the filenames are obtained).
  7. I am traversing a directory tree and printing out the name and mime type of each file. Getting the mime type fails for about 600 of my 800+ files. Initially I used mime_content_type but I read suggestions that this was deprecated so I switched to using finfo_open(FILEINFO_MIME_TYPE), only to find I get exactly the same errors on the same files. See this snippet of output Belasting\CBR\2010-05-22 Inkomstenbelasting 2009 Voorlopige aanslag.pdf -> application/pdf<br> Belasting\CBR\2011-05-27 Inkomstenbelasting 2010 Voorlopige aanslag.pdf -> <br /> <b>Warning</b>: finfo_file(C:\Users\cbree\OneDrive\docs\Belasting\CBR\2011-05-27 Inkomstenbelasting 2010 Voorlopige aanslag.pdf): Failed to open stream: Invalid argument in <b>C:\Users\cbree\OneDrive\wwwroot\doc\find.php</b> on line <b>19</b><br /> For the first file it works correctly, and the mimetype is printed. For the second file, which a very similar pdf with a very similar name, I get this error. The PHP code I use is echo str_replace($base . "\\", "", $file); echo " -> "; $finfo = finfo_open(FILEINFO_MIME_TYPE); echo finfo_file($finfo, $file); finfo_close($finfo); echo "<br>\n"; My PHP version is 8.3.3 on Windows 11. I am really stumped by this, any ideas appreciated !
  8. Thanks for that @mac_gyver ! I now remember seeing these tips posted before so I should probably have known. I can't get that to work just now but I'm sure I am doing it wrong and it looks like I first need to study the various error/logging functions in some depth. Meanwhile I'm quite satisfied with my hack, which relies on the observation (fact ?) that PHP error output always starts with <br /> It's a kludge, I know, but for a private hobby project it looks good enough until proven different.
  9. Hi Freaks, I am using XMLHttpRequest in my JavaScript code to execute PHP scripts on the server. Works great but one thing puzzles me. Whenever my PHP code triggers a runtime exception (such as a mandatory parameter being empty), the request returns with HTTP status 200 OK. Same happens even when there is a syntax error in my PHP code. Why does it not return a 500 Internal Server error in these cases ? Is there a way to get that behavior without having to check each and every command and explicitly setting a 500 status ? Thanks for any ideas !
  10. Well hehe, I got it at last. And boy, this is embarrassing. First. inspired by ginerjm's first reply, I dispensed with the XMLHttpRequest and used a simple form with a POST action, and I saw the same thing again. Then I realized that mac_gyver's question was exactly the right thing to ask. Suddenly everything fell into place. See images. The first shows that the test has worked ok. The input data is echoed back to the form via PHP. But in the network tab I see test.php which appears to be a link. Thinking this would give some interesting information, I blithely click on it an get the screen with the error - image 2. Yep, that's what I do, when I see a link I click on it 🙄 I had long noticed the extra browser tab but somehow never realized this means that test.php has been executed a second time, this time without parameters. Of course that would produce these errors. Derp. So that's all it was - seeing errors when there were none. I feel like a right idiot now 😲 Thanks both of you for the reactions. They were really helpful.
  11. Haha yes, I understand what the message is trying to tell me. But it's just not true ! My request data is param1=foo&param2=bar so that should be ok, no ? Plus, it actually works (see my reply to mac_gyver). But for shits and giggles, I will try with a HTML form instead of Ajax. Thanks for the tip about these error reporting statements. I'll make sure to include these from now on. It did not make any difference in this case though,
  12. Thanks for the reply. Unless I grossly misunderstand the XMLHttpRequest mechanism, I don't think my code is making two requests. I see only one in the debugger as expected (see image). This shows, I believe, that everything is working correctly. My full HTML/JS code: <!DOCTYPE html> <html lang="en"> <head> <script> function postit() { var xhr = new XMLHttpRequest(); xhr.open('POST', "test.php", false); xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded"); xhr.onreadystatechange = function () { if (xhr.readyState === 4) { ret = xhr.responseText; if (xhr.status != 200) { ret = `\n${xhr.status} ${xhr.statusText}`; alert(ret); return; } } } xhr.send("param1=foo&param2=bar"); result.value = ret; } </script> </head> <body onload="postit()"> <textarea id="result"></textarea> </body> </html> and the full PHP code: <?php $p1 = $_POST["param1"]; $p2 = $_POST["param2"]; echo "param1 = " . $p1 . "\n"; echo "param2 = " . $p2 . "\n"; ?>
  13. Hi folks, I'm new here and while I can get by in PHP I am far from fluent. I hope someone can explain a warning that has been nagging me for a wile now. I'm doing an Ajax post from Javascript to a PHP script, passing two variables like this (some code omitted for brevity) var xhr = new XMLHttpRequest(); xhr.open('POST', "test.php", false); xhr.send("param1=foo&param2=bar"); The PHP code picks up these values and echo them back: <?php $p1 = $_POST["param1"]; $p2 = $_POST["param2"]; echo "param1 = " . $p1 . "\n"; echo "param2 = " . $p2 . "\n"; ?> This produces the following two warnings: Warning: Undefined array key "param1" in D:\web\_BH\test.php on line 2 Warning: Undefined array key "param2" in D:\web\_BH\test.php on line 3 Yet, it works ! The output of the echo statements is duly passed back in the request's response test, correctly showing the values foo and bar. So why the warnings ? Am I missing the bleeding obvious ? This is driving me nuts... it's not a problem but I so hate warnings and errors. Thanks for any ideas 😊
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.