moose-en-a-gant Posted February 18, 2015 Share Posted February 18, 2015 Curious how I should go about creating this Creating an array / list of common everyday words with identifiers like nouns/pronouns/adverbs/verbs... Wondering if I have to scrape grammar/dictionary websites or if this already exists What could be a better approach? Quote Link to comment https://forums.phpfreaks.com/topic/294717-a-database-of-nounsadverbsverbs/ Share on other sites More sharing options...
Psycho Posted February 19, 2015 Share Posted February 19, 2015 Not enough information on what you are trying to accomplish to provide a response. What do you consider "common everyday" words? How are you planning to create that list? Some words can be used as more than one grammar type: Did you feed the dog? Did you buy the feed for the horses? We need a post for that sign? The bank will post that transaction tomorrow? etc., etc. Quote Link to comment https://forums.phpfreaks.com/topic/294717-a-database-of-nounsadverbsverbs/#findComment-1506099 Share on other sites More sharing options...
moose-en-a-gant Posted February 19, 2015 Author Share Posted February 19, 2015 I see your point, I'm creating a learning application where I can just go to a website, select all, copy, drop it into the text input and then a summary is generated after I have "taught" it to learn so to speak, eg. provided samples... more on this, a paragraph and then the words I pulled out... I believe that with the English language in general there is a structure (of course) but I mean, you could create a pattern recognition where the summary or the main point of the entire webpage could be found and then I would just create a curl or some sort of scraper that would search multiple websites looking for the same topic and creating a collection of summaries and then these would ideally be read back to me haha <- gotta hire a narrator I'm wondering about how strings are parsed eg. left to right top to bottom so I probably can't explode the words with identifiers like a sort of array and then isolate and find word frequencies and such without going from left to right through every word Quote Link to comment https://forums.phpfreaks.com/topic/294717-a-database-of-nounsadverbsverbs/#findComment-1506102 Share on other sites More sharing options...
Barand Posted March 8, 2015 Share Posted March 8, 2015 Not enough information on what you are trying to accomplish to provide a response. What do you consider "common everyday" words? How are you planning to create that list? Some words can be used as more than one grammar type: Did you feed the dog? Did you buy the feed for the horses? We need a post for that sign? The bank will post that transaction tomorrow? etc., etc. Then there is Time flies like an arrow Fruit flies like a banana and The weary ploughman plods his homeward way, The ploughman, weary, plods his homeward way, His homeward way the weary ploughman plods, His homeward way the ploughman weary plods, The weary ploughman homeward plods his way, The ploughman, weary, homeward plods his way, His way, the weary ploughman homeward plods, His way, the ploughman, weary, homeward plods, The ploughman, homeward, plods his weary way, His way the ploughman, homeward, weary plods, His homeward weary way the ploughman plods, Weary, the ploughman homeward plods his way, Weary, the ploughman plods his homeward way, Homeward, his way the weary ploughman plods, Homeward, his way the ploughman, weary, plods, Homeward, his weary way, the ploughman plods, The ploughman, homeward, weary plods his way, The ploughman, weary, homeward plods his way, His weary way, the ploughman homeward plods, His weary way, the homeward ploughman plods, Homeward the plowman plods his weary way, Homeward the weary ploughman plods his way, The weary ploughman, his way, homeward plods, The ploughman, weary, his way homeward plods, The ploughman plods his weary, homeward way, Weary, the ploughman, his way homeward plods, Weary, his homeward way the ploughman plods. Quote Link to comment https://forums.phpfreaks.com/topic/294717-a-database-of-nounsadverbsverbs/#findComment-1507862 Share on other sites More sharing options...
IThinkMyBrainHurts Posted April 8, 2015 Share Posted April 8, 2015 I started a project like this a year or so ago. Basically i've written an interface for adding words and their type(s), and manually added the entries... IMHO scraping would be either illegal or just not in the spirit of things. One way I add words is by parsing sample text and identifying the unknown words, which I then work through. To the DB, there's two tables (actually 3), one for the words (id,word,word2,added,user,status) and another for the lists (id,word_id,type,added,user,status). Third is for training content. Obviously the word goes in the word tables word entry, then for the selected word types associated with the word get their own entry in the list table. * I have separate entries for plurals, etc (even though it can recognise plurals, prefixes, etc) * word2 is an ordered version of the word for quicker anagram solving. Here's a list of word types i'm using so far (there is another list which groups these) $wordtypes=array("adjective","adjective_continent","adjective_personality_negative","adjective_personality_positive","adverb","adverb_completeness","adverb_frequency","adverb_how","adverb_manner","adverb_place","adverb_purpose","adverb_time","adverb_time_frequency","adverb_time_frequency_indef","adverb_time_point","adverb_time_relationship","adverb_what_extent","adverb_when","adverb_where","contraction_informal","interjection","noun","noun_continent","noun_country","noun_fruit","noun_names_boys_eng","noun_names_girls_eng","noun_names_unisex_eng","noun_surname_eng","prefix","preposition","pronoun","question_words","stopword","suffix_derivational","suffix_inflectional","verb","verb_regular", "plural","noun_phrase","verb_participle","verb_transitive","verb_intransitive","conjunction","definite_article","indefinite_article","nominative"); I can't currently tell you how many words I have because I've re-installed my OS recently and haven't got around to re-installing my word DB yet. But I believe I have around 10,000 words. It may not seem like many (and it's not) but it is enough to parse most children's books which was my reason for doing this. One helpful (even though baffling at first) book I have is: http://www.amazon.co.uk/Finite-state-Language-Processing-Speech-Communication/dp/0262181827/ref=sr_1_1?&keywords=finite-state+language+processing but a great one for the shelf is: http://www.amazon.co.uk/Structure-Magic-About-Language-Therapy/dp/0831400447/ref=sr_1_1?keywords=the+structure+of+magic The latter book is nothing to do with computer programming but rather NLP Both will help with the understanding of the structures of sentences. May I also point out, English may be one of the harder languages because of all the beautiful ambiguities. Quote Link to comment https://forums.phpfreaks.com/topic/294717-a-database-of-nounsadverbsverbs/#findComment-1508504 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.