topics:  main-page   everything   99things   things-to-do   software   space   future   exercise & health   faith  
  thought   web   movies+TV   music   mymusic   food   curiosity   tidbits   I remember   wishlist   misc   links


Principle of maximum entropy
August 21, 2008

Something to investigate is the principle of maximum entropy which the CodePlex project relies on.


Penn Treebank Project
August 21, 2008

Here's an interesting project, called the Penn Treebank Project.

The Penn Treebank Project annotates naturally-occuring text for linguistic structure. Most notably, we produce skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. We also annotate text with  part-of-speech tags, and for the Switchboard corpus of telephone conversations,  dysfluency annotation. We are located in the  LINC Laboratory of the Computer and Information Science Department at the University of Pennsylvania.

Also:

CC    Coordinating conjunction  RP    Particle
CD    Cardinal number           SYM   Symbol
DT    Determiner                TO    to
EX    Existential there         UH    Interjection
FW    Foreign word              VB    Verb, base form
IN    Preposition/subordinate   VBD   Verb, past tense
      conjunction
JJ    Adjective                 VBG   Verb, gerund/present
                                      participle
JJR   Adjective, comparative    VBN   Verb, past participle
JJS   Adjective, superlative    VBP   Verb, non-3rd
                                      ps. sing. present
LS    List item marker          VBZ   Verb, 3rd ps. sing. present
MD    Modal                     WDT   wh-determiner
NN    Noun, singular or mass    WP    wh-pronoun
NNP   Proper noun, singular     WP$   Possessive wh-pronoun
NNPS  Proper noun, plural       WRB   wh-adverb
NNS   Noun, plural              ``    Left open double quote
PDT   Predeterminer             ,     Comma
POS   Possessive ending         ''    Right close double quote
PRP   Personal pronoun          .     Sentence-final punctuation
PRP$  Possessive pronoun        :     Colon, semi-colon
RB    Adverb                    $     Dollar sign
RBR   Adverb, comparative       #     Pound sign
RBS   Adverb, superlative       -LRB- Left parenthesis *
                                -RRB- Right parenthesis *

* The Penn Treebank uses the ( and ) symbols,
  but these are used elsewhere by the OpenNLP parser.

This is all stuff I need to get my head in to.


Interesting language parsing article on CodePlex
August 21, 2008

http://www.codeproject.com/KB/recipes/englishparsing.aspx

This is quite an interesting looking article about natural language parsing.

older >>