tc
Class TCCategory

java.lang.Object
  extended bytc.TCCategory
All Implemented Interfaces:
java.lang.Comparable

public class TCCategory
extends java.lang.Object
implements java.lang.Comparable

This class represents a category. It has a vector of documents belonging to the category and methods to calculate the naive bayes value, the kNN value for it's documents, ...


Field Summary
private  java.util.ArrayList allTerms
          all the unique terms belonging to this category
(package private)  int categoryIndex
          the index of the category (has not necessarily to be provided)
(package private)  java.lang.String categoryLabel
          the label of the category
(package private)  int numberOfDocuments
          the number of documents belonging to this category
(package private)  int numberOfProcessedWords
          number of processed words for this category
(package private)  double numberOfWordsPerDocument
          the average number of words per document
 
Constructor Summary
TCCategory(int cIndex, java.lang.String cLabel)
           
 
Method Summary
 void addDocument(TCDocument newDocument)
          adds a new document to the category
 int compareTo(java.lang.Object arg0)
          The categories are compared using the number of documents belonging to the ctageories
 double computeNaiveBayesBernoulli(java.util.ArrayList corpusAllTermsOfInterest, TCDocument documentToClassify)
          computes the naive bayes (bernoulli) value for this category given all the terms of interest in the entire corpus and the document to classify by the naive bayes agorithm
 boolean equals(java.lang.Object arg0)
          The label of the category is the unique "index"
 java.util.ArrayList getAllTerms()
          retruns the ArrayList with all unique terms in the category
 java.lang.String getCategoryLabel()
          returns the category label (the unique index of the category)
 int getNumberOfDocuments()
          returns the number of documents
 int getNumberOfProcessedWords()
           
 double getNumberOfWordsPerDocument()
           
 void removeTerm(TCTerm term)
          remove the given term in the category and in all documents belonging to the category (recursive) (NOT USED ANYWHERE)
 void setWordsPerDocument(double wpd)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

categoryLabel

java.lang.String categoryLabel
the label of the category


categoryIndex

int categoryIndex
the index of the category (has not necessarily to be provided)


allTerms

private java.util.ArrayList allTerms
all the unique terms belonging to this category


numberOfDocuments

int numberOfDocuments
the number of documents belonging to this category


numberOfProcessedWords

int numberOfProcessedWords
number of processed words for this category


numberOfWordsPerDocument

double numberOfWordsPerDocument
the average number of words per document

Constructor Detail

TCCategory

public TCCategory(int cIndex,
                  java.lang.String cLabel)
Method Detail

getCategoryLabel

public java.lang.String getCategoryLabel()
returns the category label (the unique index of the category)

Returns:

getAllTerms

public java.util.ArrayList getAllTerms()
retruns the ArrayList with all unique terms in the category

Returns:
all unique terms in the category

getNumberOfDocuments

public int getNumberOfDocuments()
returns the number of documents

Returns:

getNumberOfWordsPerDocument

public double getNumberOfWordsPerDocument()

getNumberOfProcessedWords

public int getNumberOfProcessedWords()

setWordsPerDocument

public void setWordsPerDocument(double wpd)

computeNaiveBayesBernoulli

public double computeNaiveBayesBernoulli(java.util.ArrayList corpusAllTermsOfInterest,
                                         TCDocument documentToClassify)
computes the naive bayes (bernoulli) value for this category given all the terms of interest in the entire corpus and the document to classify by the naive bayes agorithm

Parameters:
corpusAllTermsOfInterest - all the unique terms in the corpus
documentToClassify - the document to classify
Returns:
the naive bayes log-likelihood for the given document

removeTerm

public void removeTerm(TCTerm term)
remove the given term in the category and in all documents belonging to the category (recursive) (NOT USED ANYWHERE)

Parameters:
term - the term to remove

equals

public boolean equals(java.lang.Object arg0)
The label of the category is the unique "index"


compareTo

public int compareTo(java.lang.Object arg0)
The categories are compared using the number of documents belonging to the ctageories

Specified by:
compareTo in interface java.lang.Comparable

addDocument

public void addDocument(TCDocument newDocument)
                 throws java.lang.CloneNotSupportedException
adds a new document to the category

Parameters:
newDocument - the new document
Throws:
java.lang.CloneNotSupportedException