Olumide Posted June 4 Share Posted June 4 Dear gurus in the house, I am trying to create an online CBT examination such that admin should be able to upload questions directly from Ms Word which might involve equations, graphs, formatting like bold, italicize and other features. But I am very confused regards this because I am not a guru and I don't want to use ready made software like moodle, but I want to design mine from scratch as a web application. I have used pho with no framework and was able to extract questions from Microsoft word with auto grading but this is not extracting the images, equations and the formats like bold. I stumbled from there again to use python with fastapi but all to no avail. Please guru in the house help me by putting me through on the model to use to achieve this. The questions will uploaded based on class like Basic 1, 2 and so on. I know you are all intelligent and here is a forum that solve problems. Quote Link to comment Share on other sites More sharing options...
Barand Posted June 4 Share Posted June 4 You can save MSWord docs as html. In this case, equations are saved as image files in a subfolder EG 1 Quote Link to comment Share on other sites More sharing options...
Olumide Posted June 5 Author Share Posted June 5 On 6/4/2024 at 5:49 PM, Barand said: You can save MSWord docs as html. In this case, equations are saved as image files in a subfolder EG Thank you sir. Can you please explain this further. In my previous and abandoned work, I do upload images inside the Microsoft word but and also wrote a script to identify that, but it doesn't work. It only captured the text documents. Quote Link to comment Share on other sites More sharing options...
Barand Posted June 5 Share Posted June 5 53 minutes ago, Olumide said: Can you please explain this further. If you save "X.doc", which contains an equation, as "X.html" into a "documents" folder it creates as subfolder "X_files" containing the equation as an image documents | +- X.html +- X_files | +- image1.jpg Quote Link to comment Share on other sites More sharing options...
Olumide Posted June 5 Author Share Posted June 5 31 minutes ago, Barand said: If you save "X.doc", which contains an equation, as "X.html" into a "documents" folder it creates as subfolder "X_files" containing the equation as an image documents | +- X.html +- X_files | +- image1.jpg You are very intelligent sir. I saved the document as an html file and it created a folder as you have said. But my problem is, how do extract them? Here is my upload.php which extract text directly from Microsoft word but not extracting images or equation. <?php session_start(); if ($_SESSION['role'] !== 'teacher') { header("Location: login.php"); exit(); } require 'db.php'; require 'vendor/autoload.php'; // Load the PHPWord library use PhpOffice\PhpWord\IOFactory; $database = new Database(); $db = $database->getConnection(); if ($_SERVER['REQUEST_METHOD'] == 'POST') { $class_id = $_POST['class_id']; $subject_id = $_POST['subject_id']; $file = $_FILES['file']; if ($file['type'] == 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') { $upload_dir = 'uploads/'; $file_path = $upload_dir . basename($file['name']); if (move_uploaded_file($file['tmp_name'], $file_path)) { $phpWord = IOFactory::load($file_path, 'Word2007'); if ($phpWord) { echo "File loaded successfully.<br>"; } else { echo "Failed to load file.<br>"; exit(); } $questions = []; $currentQuestion = null; foreach ($phpWord->getSections() as $section) { foreach ($section->getElements() as $element) { if (method_exists($element, 'getText')) { $text = trim($element->getText()); // Convert bold text $text = preg_replace('/\*\*(.*?)\*\*/', '<strong>$1</strong>', $text); // Convert italicized text $text = preg_replace('/__(.*?)__/', '<em>$1</em>', $text); // Convert quoted text to blockquote $text = preg_replace('/\"(.*?)\"/', '<blockquote>$1</blockquote>', $text); // Convert subscript text $text = preg_replace('/~(.*?)~/', '<sub>$1</sub>', $text); // Convert superscript text $text = preg_replace('/\^(.*?)\^/', '<sup>$1</sup>', $text); if (strpos($text, 'Question:') === 0) { if ($currentQuestion) { $questions[] = $currentQuestion; } $currentQuestion = [ 'question_text' => substr($text, 10), 'options' => [], 'correct_option' => '' ]; } elseif (strpos($text, 'A)') === 0 || strpos($text, 'B)') === 0 || strpos($text, 'C)') === 0 || strpos($text, 'D)') === 0) { if ($currentQuestion) { $optionKey = strtolower($text[0]); $currentQuestion['options'][$optionKey] = substr($text, 3); } } elseif (strpos($text, 'Answer:') === 0) { if ($currentQuestion) { $currentQuestion['correct_option'] = trim(substr($text, 7)); } } } elseif (method_exists($element, 'getDrawing')) { // Handle image extraction $drawing = $element->getDrawing(); $imagePath = $drawing->getPath(); $imageName = basename($imagePath); $newImagePath = $upload_dir . $imageName; copy($imagePath, $newImagePath); // Here you can store $newImagePath in the database for the question } } } if ($currentQuestion) { $questions[] = $currentQuestion; } foreach ($questions as $question) { $stmt = $db->prepare("INSERT INTO questions (question_text, question_type, option_a, option_b, option_c, option_d, correct_option, subject_id) VALUES (?, 'Multiple Choice', ?, ?, ?, ?, ?, ?)"); if ($stmt->execute([ $question['question_text'], $question['options']['a'] ?? '', $question['options']['b'] ?? '', $question['options']['c'] ?? '', $question['options']['d'] ?? '', $question['correct_option'], $subject_id ])) { echo "Question inserted successfully: " . $question['question_text'] . "<br>"; } else { echo "Failed to insert question: " . $question['question_text'] . "<br>"; print_r($stmt->errorInfo()); } } echo "File uploaded and questions added successfully."; } else { echo "Failed to upload file."; } } else { echo "Invalid file type. Please upload a .docx file."; } } else { $classes = $db->query("SELECT * FROM classes")->fetchAll(PDO::FETCH_ASSOC); ?> <!DOCTYPE html> <html> <head> <title>Upload Questions</title> </head> <body> <h1>Upload Questions</h1> <form action="upload.php" method="post" enctype="multipart/form-data"> <label for="class">Class:</label> <select name="class_id" id="class" onchange="fetchSubjects(this.value)"> <option value="">Select Class</option> <?php foreach ($classes as $class): ?> <option value="<?= $class['class_id'] ?>"><?= $class['class_name'] ?></option> <?php endforeach; ?> </select> <label for="subject">Subject:</label> <select name="subject_id" id="subject"> <option value="">Select Subject</option> <!-- Subjects will be populated based on selected class --> </select> <label for="file">Choose a file:</label> <input type="file" name="file" id="file" accept=".docx"> <button type="submit">Upload</button> </form> <script> function fetchSubjects(classId) { fetch(`fetch_subjects.php?class_id=${classId}`) .then(response => response.json()) .then(data => { const subjectSelect = document.getElementById('subject'); subjectSelect.innerHTML = '<option value="">Select Subject</option>'; data.forEach(subject => { const option = document.createElement('option'); option.value = subject.subject_id; option.textContent = subject.subject_name; subjectSelect.appendChild(option); }); }); } </script> </body> </html> <?php } ?> Quote Link to comment Share on other sites More sharing options...
Olumide Posted June 6 Author Share Posted June 6 I keep getting this error Warning: DOMDocument::loadXML(): Namespace prefix m on oMath is not defined in Entity, line: 1 in C:\wamp64\www\cbt\cbt_exam\vendor\phpoffice\math\src\Math\Reader\OfficeMathML.php on line 39 Call Stack #TimeMemoryFunctionLocation 10.0005449024{main}( )...\upload.php:0 20.0080581080convertDocxToHtml( $filename = 'uploads/Question_me2.docx' )...\upload.php:30 30.0087581080PhpOffice\PhpWord\IOFactory::load( $filename = 'uploads/Question_me2.docx', $readerName = ??? )...\upload.php:120 40.0100581872PhpOffice\PhpWord\Reader\Word2007->load( $docFile = 'uploads/Question_me2.docx' )...\IOFactory.php:89 50.0479598432PhpOffice\PhpWord\Reader\Word2007->readPart( $phpWord = class PhpOffice\PhpWord\PhpWord { private $sections = [0 => class PhpOffice\PhpWord\Element\Section { ... }]; private $collections = ['Bookmarks' => class PhpOffice\PhpWord\Collection\Bookmarks { ... }, 'Titles' => class PhpOffice\PhpWord\Collection\Titles { ... }, 'Footnotes' => class PhpOffice\PhpWord\Collection\Footnotes { ... }, 'Endnotes' => class PhpOffice\PhpWord\Collection\Endnotes { ... }, 'Charts' => class PhpOffice\PhpWord\Collection\Charts { ... }, 'Comments' => class PhpOffice\PhpWord\Collection\Comments { ... }]; private $metadata = ['DocInfo' => class PhpOffice\PhpWord\Metadata\DocInfo { ... }, 'Settings' => class PhpOffice\PhpWord\Metadata\Settings { ... }, 'Compatibility' => class PhpOffice\PhpWord\Metadata\Compatibility { ... }] }, $relationships = ['main' => ['rId1' => [...], 'rId2' => [...], 'rId3' => [...]], 'document' => ['rId1' => [...], 'rId2' => [...], 'rId3' => [...], 'rId4' => [...], 'rId5' => [...], 'rId6' => [...]]], $commentRefs = [], $partName = 'Document', $docFile = 'uploads/Question_me2.docx', $xmlFile = 'word/document.xml' )...\Word2007.php:89 and here is the OfficeMathML.php <?php namespace PhpOffice\Math\Reader; use DOMDocument; use DOMNode; use DOMXPath; use PhpOffice\Math\Element; use PhpOffice\Math\Exception\InvalidInputException; use PhpOffice\Math\Exception\NotImplementedException; use PhpOffice\Math\Math; class OfficeMathML implements ReaderInterface { /** @var DOMDocument */ protected $dom; /** @var Math */ protected $math; /** @var DOMXpath */ protected $xpath; /** @var string[] */ protected $operators = ['+', '-', '/', '∗']; public function read(string $content): ?Math { $nsMath = 'xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"'; $nsWord = 'xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"'; $content = str_replace( $nsMath, $nsMath . ' ' . $nsWord, $content ); $this->dom = new DOMDocument(); $this->dom->loadXML($content); $this->math = new Math(); $this->parseNode(null, $this->math); return $this->math; } /** * @see https://devblogs.microsoft.com/math-in-office/officemath/ * @see https://learn.microsoft.com/fr-fr/archive/blogs/murrays/mathml-and-ecma-math-omml * * @param Math|Element\AbstractGroupElement $parent */ protected function parseNode(?DOMNode $nodeRowElement, $parent): void { $this->xpath = new DOMXpath($this->dom); foreach ($this->xpath->query('*', $nodeRowElement) ?: [] as $nodeElement) { $element = $this->getElement($nodeElement); $parent->add($element); if ($element instanceof Element\AbstractGroupElement) { $this->parseNode($nodeElement, $element); } } } protected function getElement(DOMNode $nodeElement): Element\AbstractElement { switch ($nodeElement->nodeName) { case 'm:f': // Numerator $nodeNumerator = $this->xpath->query('m:num/m:r/m:t', $nodeElement); if ($nodeNumerator && $nodeNumerator->length == 1) { $value = $nodeNumerator->item(0)->nodeValue; if (is_numeric($value)) { $numerator = new Element\Numeric(floatval($value)); } else { $numerator = new Element\Identifier($value); } } else { throw new InvalidInputException(sprintf( '%s : The tag `%s` has no numerator defined', __METHOD__, $nodeElement->nodeName )); } // Denominator $nodeDenominator = $this->xpath->query('m:den/m:r/m:t', $nodeElement); if ($nodeDenominator && $nodeDenominator->length == 1) { $value = $nodeDenominator->item(0)->nodeValue; if (is_numeric($value)) { $denominator = new Element\Numeric(floatval($value)); } else { $denominator = new Element\Identifier($value); } } else { throw new InvalidInputException(sprintf( '%s : The tag `%s` has no denominator defined', __METHOD__, $nodeElement->nodeName )); } return new Element\Fraction($numerator, $denominator); case 'm:r': $nodeText = $this->xpath->query('m:t', $nodeElement); if ($nodeText && $nodeText->length == 1) { $value = trim($nodeText->item(0)->nodeValue); if (in_array($value, $this->operators)) { return new Element\Operator($value); } if (is_numeric($value)) { return new Element\Numeric(floatval($value)); } return new Element\Identifier($value); } throw new InvalidInputException(sprintf( '%s : The tag `%s` has no tag `m:t` defined', __METHOD__, $nodeElement->nodeName )); case 'm:oMath': return new Element\Row(); default: throw new NotImplementedException(sprintf( '%s : The tag `%s` is not implemented', __METHOD__, $nodeElement->nodeName )); } } } And here is my upload.php <?php session_start(); if ($_SESSION['role'] !== 'teacher') { header("Location: login.php"); exit(); } require 'db.php'; require 'vendor/autoload.php'; // Load the PHPWord library use PhpOffice\PhpWord\IOFactory; use PhpOffice\PhpWord\Settings; use PhpOffice\PhpWord\Shared\Html; $database = new Database(); $db = $database->getConnection(); if ($_SERVER['REQUEST_METHOD'] == 'POST') { $class_id = $_POST['class_id']; $subject_id = $_POST['subject_id']; $file = $_FILES['file']; if ($file['type'] == 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') { $upload_dir = 'uploads/'; $file_path = $upload_dir . basename($file['name']); if (move_uploaded_file($file['tmp_name'], $file_path)) { try { // Convert .docx file to HTML $htmlFilePath = convertDocxToHtml($file_path); if ($htmlFilePath) { echo "File converted to HTML successfully.<br>"; } else { echo "Failed to convert file to HTML.<br>"; exit(); } // Parse the HTML file $htmlContent = file_get_contents($htmlFilePath); $questions = parseHtmlToQuestions($htmlContent); foreach ($questions as $question) { $stmt = $db->prepare("INSERT INTO questions (question_text, question_type, option_a, option_b, option_c, option_d, correct_option, subject_id) VALUES (?, 'Multiple Choice', ?, ?, ?, ?, ?, ?)"); if ($stmt->execute([ $question['question_text'], $question['options']['a'] ?? '', $question['options']['b'] ?? '', $question['options']['c'] ?? '', $question['options']['d'] ?? '', $question['correct_option'], $subject_id ])) { echo "Question inserted successfully: " . htmlspecialchars($question['question_text']) . "<br>"; } else { echo "Failed to insert question: " . htmlspecialchars($question['question_text']) . "<br>"; print_r($stmt->errorInfo()); } } echo "File uploaded and questions added successfully."; } catch (Exception $e) { echo "Error processing file: " . htmlspecialchars($e->getMessage()); } } else { echo "Failed to upload file."; } } else { echo "Invalid file type. Please upload a .docx file."; } } else { $classes = $db->query("SELECT * FROM classes")->fetchAll(PDO::FETCH_ASSOC); ?> <!DOCTYPE html> <html> <head> <title>Upload Questions</title> </head> <body> <h1>Upload Questions</h1> <form action="upload.php" method="post" enctype="multipart/form-data"> <label for="class">Class:</label> <select name="class_id" id="class" onchange="fetchSubjects(this.value)"> <option value="">Select Class</option> <?php foreach ($classes as $class): ?> <option value="<?= htmlspecialchars($class['class_id']) ?>"><?= htmlspecialchars($class['class_name']) ?></option> <?php endforeach; ?> </select> <label for="subject">Subject:</label> <select name="subject_id" id="subject"> <option value="">Select Subject</option> <!-- Subjects will be populated based on selected class --> </select> <label for="file">Choose a file:</label> <input type="file" name="file" id="file" accept=".docx"> <button type="submit">Upload</button> </form> <script> function fetchSubjects(classId) { fetch(`fetch_subjects.php?class_id=${classId}`) .then(response => response.json()) .then(data => { const subjectSelect = document.getElementById('subject'); subjectSelect.innerHTML = '<option value="">Select Subject</option>'; data.forEach(subject => { const option = document.createElement('option'); option.value = subject.subject_id; option.textContent = subject.subject_name; subjectSelect.appendChild(option); }); }); } </script> </body> </html> <?php } function convertDocxToHtml($filename) { $phpWord = IOFactory::load($filename); $htmlWriter = IOFactory::createWriter($phpWord, 'HTML'); $htmlFilePath = str_replace('.docx', '.html', $filename); $htmlWriter->save($htmlFilePath); return $htmlFilePath; } function parseHtmlToQuestions($htmlContent) { $questions = []; $currentQuestion = null; $dom = new DOMDocument; @$dom->loadHTML($htmlContent); $paragraphs = $dom->getElementsByTagName('p'); $currentPassage = ''; foreach ($paragraphs as $paragraph) { $text = trim($paragraph->textContent); $nodeContent = ''; foreach ($paragraph->childNodes as $node) { if ($node->nodeName === 'b' || $node->nodeName === 'strong') { $nodeContent .= '<b>' . $node->textContent . '</b>'; } elseif ($node->nodeName === 'i' || $node->nodeName === 'em') { $nodeContent .= '<i>' . $node->textContent . '</i>'; } elseif ($node->nodeName === 'img') { $src = $node->getAttribute('src'); $nodeContent .= '<img src="' . $src . '">'; } else { $nodeContent .= $node->textContent; } } // Handle equations (assuming they are marked with a specific tag or format) if (strpos($text, 'Equation:') === 0) { $equation = substr($text, 9); $currentQuestion['equation'] = $equation; // Store equations separately } if (strpos($text, 'Passage:') === 0) { if ($currentPassage) { $questions[] = ['passage' => $currentPassage, 'questions' => []]; } $currentPassage = substr($text, 8); } elseif (strpos($text, 'Question:') === 0) { if ($currentQuestion) { $questions[] = $currentQuestion; } $currentQuestion = [ 'question_text' => $nodeContent, 'options' => [], 'correct_option' => '', 'equation' => '', 'images' => [] ]; } elseif (preg_match('/^[A-D]\)/', $text)) { if ($currentQuestion) { $optionKey = strtolower($text[0]); $currentQuestion['options'][$optionKey] = substr($text, 3); } } elseif (strpos($text, 'Answer:') === 0) { if ($currentQuestion) { $currentQuestion['correct_option'] = trim(substr($text, 7)); } } } if ($currentQuestion) { $questions[] = $currentQuestion; } return $questions; } ?> Quote Link to comment Share on other sites More sharing options...
Barand Posted June 6 Share Posted June 6 Another option, maybe Create question in MSWord Save page as html doc Display it in an <iframe> on your page 1 Quote Link to comment Share on other sites More sharing options...
Olumide Posted June 22 Author Share Posted June 22 On 6/6/2024 at 2:28 PM, Barand said: Another option, maybe Create question in MSWord Save page as html doc Display it in an <iframe> on your page I appreciate you, but that does not give me the options. Though, I pending the project for now as it is taking more time and still no solution. I will still get back on it. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.