Jump to content

Parsing an XML file (problems with spliting string)


Recommended Posts

Hi there!

 

I am parsing an XML file that returns me the following content when doing a var_dump of the variable where I save the request result:

 

SimpleXMLElement Object ( [OperationRequest] => SimpleXMLElement Object ( [RequestId] => f97fc8ad-5b22-4b6d-afc1-5bd39688ccc2 ) [GetAssignmentsForHITResult] => SimpleXMLElement Object ( [Request] => SimpleXMLElement Object ( [isValid] => True ) [NumResults] => 1 [TotalNumResults] => 1 [PageNumber] => 1 [Assignment] => SimpleXMLElement Object ( [AssignmentId] => 17VV0P0347QWX375EKBKCLRW6OGIPO [WorkerId] => A2E4B4CF25RBJ0 [HITId] => 195QQH69ZJBV49I6YWTTTC1XMJKWCQ [AssignmentStatus] => Submitted [AutoApprovalTime] => 2010-03-24T22:06:23Z [AcceptTime] => 2010-02-22T22:05:09Z [submitTime] => 2010-02-22T22:06:23Z [Answer] => primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School| ) ) ) 

 

I am trying to parse the [Answer] content, so I do the following operation:

$answer =$xmlResponse->GetAssignmentsForHITResult->Assignment->Answer;

 

When printing out the $answer variable (using a var_dump) I receive the following:

object(SimpleXMLElement)#13 (1) { [0]=> string(2176) " primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School| " } 

 

In order to get the string itself I do the following :

 

$test = $answer[0];

echo $test;

 

And what it is shown in the screen is:

primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School|

 

Now, I want to split the results by spaces so I do the following:

 

$spliting = split(" ",$prova);

echo $spliting[1];

 

As a surprise for me, instead of getting one of the words of the string, I get this:

version="1.0"

And if I print the second position I get this:

encoding="UTF-8"?>

 

Some ideas why this might be happening?  It is strange that I have the string, and when I split it I get these strange tags... I don't know why.

 

Thank you

 

 

 

 

 

My guess is that you don't want to be using $prova in this line: $spliting = split(" ",$prova); as it looks to be the original XML string. Using the variable names in your post, you want to split on $test instead.

Hi there!

 

I am parsing an XML file that returns me the following content when doing a var_dump of the variable where I save the request result:

 

SimpleXMLElement Object ( [OperationRequest] => SimpleXMLElement Object ( [RequestId] => f97fc8ad-5b22-4b6d-afc1-5bd39688ccc2 ) [GetAssignmentsForHITResult] => SimpleXMLElement Object ( [Request] => SimpleXMLElement Object ( [isValid] => True ) [NumResults] => 1 [TotalNumResults] => 1 [PageNumber] => 1 [Assignment] => SimpleXMLElement Object ( [AssignmentId] => 17VV0P0347QWX375EKBKCLRW6OGIPO [WorkerId] => A2E4B4CF25RBJ0 [HITId] => 195QQH69ZJBV49I6YWTTTC1XMJKWCQ [AssignmentStatus] => Submitted [AutoApprovalTime] => 2010-03-24T22:06:23Z [AcceptTime] => 2010-02-22T22:05:09Z [submitTime] => 2010-02-22T22:06:23Z [Answer] => primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School| ) ) ) 

 

I am trying to parse the [Answer] content, so I do the following operation:

$answer =$xmlResponse->GetAssignmentsForHITResult->Assignment->Answer;

 

When printing out the $answer variable (using a var_dump) I receive the following:

object(SimpleXMLElement)#13 (1) { [0]=> string(2176) " primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School| " } 

 

In order to get the string itself I do the following :

 

$test = $answer[0];

echo $test;

 

And what it is shown in the screen is:

primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School|

 

Now, I want to split the results by spaces so I do the following:

 

$spliting = split(" ",$test);

echo $spliting[1];

 

As a surprise for me, instead of getting one of the words of the string, I get this:

version="1.0"

And if I print the second position I get this:

encoding="UTF-8"?>

 

Some ideas why this might be happening?  It is strange that I have the string, and when I split it I get these strange tags... I don't know why.

 

Thank you

Hi there,

 

I do a request using a library, and I get an XML response from a website, the first I do is:

$xmlResponse = $mturk->getAssignmentsForHIT($hitID);  //($mturk is an object from a created class that interacts with the API of a website)

 

var_dump($xmlResponse);

This action give me the following output:

 

object(SimpleXMLElement)#2 (2) { ["OperationRequest"]=> object(SimpleXMLElement)#5 (1) { ["RequestId"]=> string(36) "4c0d2fe4-c961-4f9a-9dd5-a8402a42fa62" } ["GetAssignmentsForHITResult"]=> object(SimpleXMLElement)#3 (5) { ["Request"]=> object(SimpleXMLElement)#4 (1) { ["IsValid"]=> string(4) "True" } ["NumResults"]=> string(1) "1" ["TotalNumResults"]=> string(1) "1" ["PageNumber"]=> string(1) "1" ["Assignment"]=> object(SimpleXMLElement)#6 (8) { ["AssignmentId"]=> string(30) "17VV0P0347QWX375EKBKCLRW6OGIPO" ["WorkerId"]=> string(14) "A2E4B4CF25RBJ0" ["HITId"]=> string(30) "195QQH69ZJBV49I6YWTTTC1XMJKWCQ" ["AssignmentStatus"]=> string(9) "Submitted" ["AutoApprovalTime"]=> string(20) "2010-03-24T22:06:23Z" ["AcceptTime"]=> string(20) "2010-02-22T22:05:09Z" ["SubmitTime"]=> string(20) "2010-02-22T22:06:23Z" ["Answer"]=> string(2176) " primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School| " } } }

 

Then, I want to parse the Answer part of this so I do the following:

 

$answer =$xmlResponse->GetAssignmentsForHITResult->Assignment->Answer;

var_dump($answer);

 

Which gives me the following:

 

object(SimpleXMLElement)#13 (1) { [0]=> string(2176) " primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School| " }

 

So, what I do in order to get the string is the following:

 

$answerString = $answer[0];

echo $answerString;

 

And this action outputs the following:

 

primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the jeans of henry sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School|

 

The problem comes when I try to split this by spaces, by doing:

$splitAnswer = split(' ',$answerString);

var_dump($splitAnswer);

 

I get the following answer:

 

array(8) { [0]=> string(5) " string(613) "xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionFormAnswers.xsd"> primaryJob nothing reasonUsing Entertainment|Income|moremoney mobilPhone iphone3G educationLevel levelPhD changeLife the" [4]=> string(5) "jeans" [5]=> string(2) "of" [6]=> string(5) "henry" [7]=> string(1487) "sucxx educationLevelFather levelPhD sex female peopleHome 8 oftenTurk MoreOnceWeek fatherPrimaryJob yaah motherPrimaryJob idontknow monthlyIncome 12313 own MobilePhone|Bicycle age 60 educationLevelMother levelPhD existHome Television|Refrigerator countryVisit oneVisit firstTurk halfyear locationTurk School| " }

 

I don't know why this geats all mess up, when it seems than when I have the answer is just a string, and there is nothing like : "xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionFormAnswers.xsd">, but when I split it appears...

 

Some suggestions?

 

Thank you.

 

 

Hi again,

 

I have tried to use the function explode instead of using split, and it happens the same.

 

Anyone has some suggestions what this might be happening and how can I split the Answer section by spaces?

 

Thank you...

Are you sure that the values you posted above are exactly the right values? Everything being mushed together (var_dump output should be nicely formatted) suggests you're outputting as HTML which will cause HTML-like tags (i.e. XML) to be not displayed.

 

Output the var_dumps as plain text (send the header('Content-Type: text/plain; charset=utf-8'); before var_dumping. 

 

If that's not relevant, please post a sample of the XML as previously requested. Use echo $xmlResponse->asXml().

Hi there,

 

I thought var_dump was used to show the XML content, but it seems than when doing a asXML, I get what I wanted.  What I get is the following:

 

84b921ab-1a2a-4057-8992-02f95ddd14dcTrue11117VV0P0347QWX375EKBKCLRW6OGIPOA2E4B4CF25RBJ0195QQH69ZJBV49I6YWTTTC1XMJKWCQSubmitted2010-03-24T22:06:23Z2010-02-22T22:05:09Z2010-02-22T22:06:23Z<?xml version="1.0" encoding="UTF-8"?> <QuestionFormAnswers xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionFormAnswers.xsd"> <Answer> <QuestionIdentifier>primaryJob</QuestionIdentifier> <FreeText>nothing</FreeText> </Answer> <Answer> <QuestionIdentifier>reasonUsing</QuestionIdentifier> <FreeText>Entertainment|Income|moremoney</FreeText> </Answer> <Answer> <QuestionIdentifier>mobilPhone</QuestionIdentifier> <FreeText>iphone3G</FreeText> </Answer> <Answer> <QuestionIdentifier>educationLevel</QuestionIdentifier> <FreeText>levelPhD</FreeText> </Answer> <Answer> <QuestionIdentifier>changeLife</QuestionIdentifier> <FreeText>the jeans of henry sucxx</FreeText> </Answer> <Answer> <QuestionIdentifier>educationLevelFather</QuestionIdentifier> <FreeText>levelPhD</FreeText> </Answer> <Answer> <QuestionIdentifier>sex</QuestionIdentifier> <FreeText>female</FreeText> </Answer> <Answer> <QuestionIdentifier>peopleHome</QuestionIdentifier> <FreeText>8</FreeText> </Answer> <Answer> <QuestionIdentifier>oftenTurk</QuestionIdentifier> <FreeText>MoreOnceWeek</FreeText> </Answer> <Answer> <QuestionIdentifier>fatherPrimaryJob</QuestionIdentifier> <FreeText>yaah</FreeText> </Answer> <Answer> <QuestionIdentifier>motherPrimaryJob</QuestionIdentifier> <FreeText>idontknow</FreeText> </Answer> <Answer> <QuestionIdentifier>monthlyIncome</QuestionIdentifier> <FreeText>12313</FreeText> </Answer> <Answer> <QuestionIdentifier>own</QuestionIdentifier> <FreeText>MobilePhone|Bicycle</FreeText> </Answer> <Answer> <QuestionIdentifier>age</QuestionIdentifier> <FreeText>60</FreeText> </Answer> <Answer> <QuestionIdentifier>educationLevelMother</QuestionIdentifier> <FreeText>levelPhD</FreeText> </Answer> <Answer> <QuestionIdentifier>existHome</QuestionIdentifier> <FreeText>Television|Refrigerator</FreeText> </Answer> <Answer> <QuestionIdentifier>countryVisit</QuestionIdentifier> <FreeText>oneVisit</FreeText> </Answer> <Answer> <QuestionIdentifier>firstTurk</QuestionIdentifier> <FreeText>halfyear</FreeText> </Answer> <Answer> <QuestionIdentifier>locationTurk</QuestionIdentifier> <FreeText>School|</FreeText> </Answer> </QuestionFormAnswers>

 

The first lines, are things related to my request (assignemtn Id, time of submission) and I don't know if that will bother at the time of parse the XML, but I have tried to do:

$xmlResponse = $mturk->getAssignmentsForHIT($hitID);

$XMLToParse =  $xmlResponse->asXml();

 

echo $XMLToParse;

 

echo $XMLToparse -> Answer[0]->QuestionIdentifier;

 

And it doesn't show the QuestionIdentifier of the first Answer.  Is that because the XML file has some added things, and therefore it is not possible to access to its elements?

 

Thank you :_:

 

 

Also, when I look at the source code of the page it appears this:

 

<GetAssignmentsForHITResponse><OperationRequest><RequestId>803b2aab-e4db-4bb8-a6b2-1bf37100db51</RequestId></OperationRequest><GetAssignmentsForHITResult><Request><IsValid>True</IsValid></Request><NumResults>2</NumResults><TotalNumResults>2</TotalNumResults><PageNumber>1</PageNumber><Assignment><AssignmentId>186PL25AX3RGKLPMV58QHQVJR45ZT9</AssignmentId><WorkerId>AG3SJ6UEP6J7V</WorkerId><HITId>1UZJHZKE5WQAINEAZS0X6G357ZQZ64</HITId><AssignmentStatus>Submitted</AssignmentStatus><AutoApprovalTime>2010-03-29T22:24:08Z</AutoApprovalTime><AcceptTime>2010-02-27T22:23:25Z</AcceptTime><SubmitTime>2010-02-27T22:24:08Z</SubmitTime><Answer><?xml version="1.0" encoding="UTF-8"?>

<QuestionFormAnswers xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionFormAnswers.xsd">

<Answer>

<QuestionIdentifier>age</QuestionIdentifier>

<FreeText>23</FreeText>

</Answer>

<Answer>

<QuestionIdentifier>fatherPrimaryJob</QuestionIdentifier>

<FreeText>fatherjJob</FreeText>

</Answer>

<Answer>

<QuestionIdentifier>locationTurk</QuestionIdentifier>

<FreeText>Home|</FreeText>

</Answer>

<Answer>

 

I don't know why it doesn't appears with the brackets, and therefore I can't access to them...

 

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.