Friends,
Wishes for 2016!
Please help me with this request. I need to get it sorted for my PHD thesis.
I am using MYSQL and PHP (XIMPP - Localhost for reserach).
I have dumped more than half a million Corporate news for past 4 years for India in news_content field. Some of the samples of that field is as below:
1) With reference to the earlier letter dated December 18, 2015 in connection with the Scheme of Amalgamation between Digjam Limited and Digjam Textiles Limited ('the ompanies') and their respective creditors and shareholders. Digjam Ltd has informed BSE that as directed by the Hon'ble High Court of Gujarat vide their Order dated December 18, 2015, the enclosed newspaper notice is being published in the Newspapers.
2) BEML Ltd has informed BSE regarding a Press Release, titled "BEML bags Export Award".
3) KEI Industries Ltd has informed BSE regarding "Bagging of Orders / Notification of Awards (NOA) valuing Rs. 384.53 Crores (Ex-works) from Power Grid Corporation of India Limited (PGCIL)".
4) With reference to the earlier Press Release dated December 26, 2015 regarding "Srikalahasthi Pipes Limited bags orders of Rs.1047 Crores during December, 2015". rikalahasthi Pipes Ltd has now informed BSE that the value of the orders received was mentioned as Rs. 1,047 Crores in the caption of the Press Release instead of Rs. ,053 Crores. Srikalahasthi Pipes Ltd has now submitted to BSE a copy of the Revised Press Release titled "Srikalahasthi Pipes Limited bags orders of Rs. 1053 Crores uring December, 2015".
5) Steel Strips Wheels Ltd has informed BSE regarding "SSWL bags exclusive nomination for Mahindra’s Puddling and Vineyard tractor range".
6) Star Delta Transformers Ltd has informed BSE that the Extra Ordinary General Meeting (EGM) of the Company will be held on January 23, 2016.
For my research requirement, I have to dynamically categorize most of the news based on Keywords. But it's not as simple as i thought. Because, if i use one keyword only then I will mis-categorical lot of news. For exmaple: If i use word "order" in above 6 news then there are only 2,3,4 news which talks about getting a new "Order". Rest of the news are about either Court Order or "Extra Ordinary General". So there are false positive.
Also, another issue is, i want to use multiple keywords seperated by comma to categorise. So if any of the KWs are found then i can categorise.
Another thing is, I should be able to define the Negative KWs, which should not be present in news.
So, its now spinning my head. i am not able to think through on how to sort it out?
How to implement any solution at all on PHP and MySQL?
Any help Please???
Regards,
Natasha