tc
Class TCCorpusLoader

java.lang.Object
  extended byorg.xml.sax.helpers.DefaultHandler
      extended bytc.TCCorpusLoader
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler
Direct Known Subclasses:
TCCorpusLoaderReuters, TCCorpusLoaderSelfmade

public abstract class TCCorpusLoader
extends org.xml.sax.helpers.DefaultHandler

all the classes which load corpora have to extend this class


Field Summary
(package private)  TCCorpus corpus
          the corpus to be returned
(package private)  java.util.ArrayList labels
          contains the labels for the examined document
(package private)  int numberOfTestDocumentsWithLabel
          the number of test documents which belong to at least one known category
(package private)  int stemmerMethod
          the method of the stemmer (0: no stemmer; 1: porter; 2: lancaster)
(package private)  java.lang.String stopWordsString
          a string holding all the stop words
(package private)  java.util.ArrayList testSet
          the test set which will be generated for evaluation
(package private)  boolean useStopWordRemoval
          indicates if a stop word removal should be applied
 
Constructor Summary
TCCorpusLoader()
           
 
Method Summary
 int getNumberOfTestDocumentsWithLabel()
           
 java.util.ArrayList getTestSet()
           
 TCCorpus loadCorpus()
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

labels

java.util.ArrayList labels
contains the labels for the examined document


testSet

java.util.ArrayList testSet
the test set which will be generated for evaluation


corpus

TCCorpus corpus
the corpus to be returned


stemmerMethod

int stemmerMethod
the method of the stemmer (0: no stemmer; 1: porter; 2: lancaster)


useStopWordRemoval

boolean useStopWordRemoval
indicates if a stop word removal should be applied


stopWordsString

java.lang.String stopWordsString
a string holding all the stop words


numberOfTestDocumentsWithLabel

int numberOfTestDocumentsWithLabel
the number of test documents which belong to at least one known category

Constructor Detail

TCCorpusLoader

public TCCorpusLoader()
Method Detail

getTestSet

public java.util.ArrayList getTestSet()

getNumberOfTestDocumentsWithLabel

public int getNumberOfTestDocumentsWithLabel()

loadCorpus

public TCCorpus loadCorpus()