|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objecttc.TCExperiment
This class represents an experiment with one corpus. it holds the properties of the experiment (weighting function, stemming, stopword removal, dimensionality, classifier properties etc.) and provides methods to load and test the corpus, fill tables, create graphs for the GUI, ...
| Field Summary | |
(package private) java.util.ArrayList |
batchEvaluationResults
holds all the batch evaluations made so far |
(package private) TCCorpusLoader |
corpusLoader
the loader for the corpus |
(package private) java.lang.String |
corpusName
holds the name of the corpus |
(package private) TCCorpusDistributionsData |
distributionsDataCorpus
contains the mean and standard deviation of some gaussian distributions |
(package private) TCCorpusDistributionsData |
distributionsDataForTermsOfInterest
contains the mean and standard deviation of some gaussian distributions |
(package private) int |
featureSpaceDimension
the current feature space dimension (dimension of the feature term vector) |
(package private) int |
kNN_k
number of nearest neighbors used for the knn algorithm |
(package private) double |
kNNThreshold
the threshold for the kNN classifier |
(package private) double |
naiveBayesThreshold
the threshold for the naive bayes classifier |
(package private) javax.swing.JFrame |
parentFrame
the parent frame (used for progress monitoring etc...) |
(package private) java.util.ArrayList[] |
precisionRecallResults
holds all the (recall, precision) values for the precision and recall graph |
(package private) int |
stemmerMethod
indicates which stemming method should be used |
(package private) java.lang.String |
stopWordsString
the string containing all the words considered as stop words |
(package private) TCCorpus |
testCorpus
the actual corpus used for this experiment |
(package private) boolean |
useStopWordRemoval
indicates if stop words should be removed |
(package private) int |
weightFunctionIndex
the index of the weighting function currently used |
(package private) int |
weightFunctionSumIndex
the method of sumation of the weighting function currently in use |
| Constructor Summary | |
TCExperiment()
constructor. |
|
| Method Summary | |
java.lang.String |
classifyDocumentAndProvideReport(java.lang.String path,
int classifierIndex)
classifies a single document and provides a textual report |
TCAccuracyResult |
computeClassifierAccuracyKNN(double stepSize)
computes the accuracy (break even, 11pt precision, precision-recall values, etc.) for the kNN classifier. |
TCAccuracyResult |
computeClassifierAccuracyNBB(double stepSize)
computes the accuracy (break even, precision-recall values) for the naive bayes classifier. |
boolean |
computeWeightValues(int minimumDFTFThreshold,
javax.swing.ProgressMonitor progressMonitor)
computes the weight values for the experiment corpus |
Graph2D |
createEvaluationGraph(int evaluationNumber)
creates the batch evaluation graph where different setting (e.g. |
Graph2D |
createGDGraph(int distributionIndex)
creates the gaussian distributions graph for later use |
Graph2D |
createPrecisionRecallGraph(int classifierIndex)
creates the precision<->recall graph based on an ArrayList with evaluation results for many different thresholds |
TCEvaluationResult |
evaluateClassification(int classifierIndex)
Evaluates the kNN or NBB classificator with the current settings and calculates recall and precision. |
void |
evaluateClassificationBatch()
Evaluates (precision and/or recall) one or more classifiers for different types of setting (weighting function, properties of the classificatiob algorithm, number of unique terms, etc.) and fills a TCBatchResult object which can be saved to disk or used for plotting the curves... |
TCEvaluationResult |
evaluateKNNClassificationForThreshold(double threshold,
boolean display)
Evaluates the kNN classificator with the current settings (and one single threshold) and calculates recall and precision. |
TCEvaluationResult |
evaluateNBBClassificationForThreshold(double threshold,
boolean display)
Evaluates the naive bayes bernoulli classificator with the current settings (and one single threshold) and calculates recall and precision. |
void |
exportDocumentVectors(java.lang.String path,
int documentType)
|
TCTableData |
fillBatchEvaluationDataTable()
creates the table with the batch evaluation data |
TCTableData |
fillCategoryDataTable()
resembles the data of every category (number of document, etc.) |
TCTableData |
fillDocumentDataTable(int property)
fills the table with properties of all the documents |
TCTableData |
fillTermDataTable()
shows the properties of all the terms in the terms of interest arraylist (which are the terms after the dimensionality reduction step) |
int |
getNumberOfCategories()
returns the number of categories in the experiment corpus |
int |
getNumberOfDocuments()
returns the number of documents in the experiment corpus |
int |
getNumberOfLabeledDocuments()
returns the number of documents which were assigned to at least one category |
int |
getNumberOfProcessedWords()
the number of processed words (the number of words which were sorted into the term vectors by the loadCorpus algorithms |
int |
getNumberOfTestDocuments()
returns the number of test documents currently loaded |
int |
getNumberOfTestDocumentsWithLabel()
returns the number of test documents with at least one label (which are essigned to at least one category) |
int |
getNumberOfUniqueTerms()
returns the number of distinct (unique) terms in the corpus |
double |
getNumberOfWordsPerDocument()
returns the average number of words per document in the corpus |
java.lang.String |
getStopWordsString()
|
int |
getWeightFunctionIndex()
returns the weight function index currently in use |
int |
getWeightFunctionSumMethodIndex()
returns the index of the weight function sumation method |
void |
loadBatchEvaluationResult(java.lang.String path)
|
void |
loadClassificatorScoring(java.lang.String path,
int classifierIndex)
loads a scoring file. |
void |
loadCorpus(boolean useSWR,
int stemMethod)
loads one of the implemented corpora. |
void |
loadStopWordsString()
loads the file with all the stop words and saves it in a String for later use |
double |
reduceFeatureSpaceDimension(int fSD,
int minimumDFTFFrequency)
reduces the feature space and calculates the distributions and he tfidf values for all the documents in the corpus |
void |
saveBatchEvaluationResult(java.lang.String path,
int resultIndex)
saves a batch evaluation to the given path |
void |
saveClassificatorScoring(java.lang.String path,
int classifierIndex)
saves the classificator scorings for later use. |
void |
setCorpusName(java.lang.String name)
sets the label of the corpus |
void |
setKNNk(int k)
sets the k value for the kNN classifier |
void |
setKNNThreshold(double t)
sets the threshold for the naive bayes classifier |
void |
setNaiveBayesThreshold(double t)
sets the threshold for the naive bayes classifier |
void |
setParentJFrame(javax.swing.JFrame frame)
sets the JFrame this class was created from (used for progress monitoring etc.) |
void |
setWeightFunctionIndex(int wfindex)
sets the weighting function which should be used for the dimensionality reduction step |
void |
setWeightFunctionSumMethodIndex(int wfsmi)
sets the method of sumation for the weighting function currently in use |
void |
trainKNNClassifier()
calculates all the kNN similarity scores for every test documents loaded by the corpusLoader. |
void |
trainNaiveBayesClassifier()
trains the naive bayes classifier (calculates the category probabilities for every document) results are saved in the TCCorpus class for later processing |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
java.lang.String corpusName
int stemmerMethod
boolean useStopWordRemoval
int weightFunctionIndex
int weightFunctionSumIndex
int featureSpaceDimension
int kNN_k
double naiveBayesThreshold
double kNNThreshold
java.lang.String stopWordsString
TCCorpus testCorpus
TCCorpusLoader corpusLoader
TCCorpusDistributionsData distributionsDataForTermsOfInterest
TCCorpusDistributionsData distributionsDataCorpus
java.util.ArrayList[] precisionRecallResults
java.util.ArrayList batchEvaluationResults
javax.swing.JFrame parentFrame
| Constructor Detail |
public TCExperiment()
| Method Detail |
public void setCorpusName(java.lang.String name)
name - the label of the corpuspublic void setParentJFrame(javax.swing.JFrame frame)
frame - the JFrame objectpublic void setKNNk(int k)
k - number of nearest neighbors consideredpublic void setNaiveBayesThreshold(double t)
t - public void setKNNThreshold(double t)
t - public void setWeightFunctionIndex(int wfindex)
wfindex - the index of the weighting functionpublic void setWeightFunctionSumMethodIndex(int wfsmi)
wfsmi - the index of the sumation methodpublic int getWeightFunctionIndex()
public int getWeightFunctionSumMethodIndex()
public int getNumberOfDocuments()
public int getNumberOfLabeledDocuments()
public int getNumberOfCategories()
public int getNumberOfUniqueTerms()
public int getNumberOfProcessedWords()
public double getNumberOfWordsPerDocument()
public int getNumberOfTestDocuments()
public int getNumberOfTestDocumentsWithLabel()
public void loadStopWordsString()
public java.lang.String getStopWordsString()
public void loadCorpus(boolean useSWR,
int stemMethod)
useSWR - indicates if the stopword removal should be applied
public boolean computeWeightValues(int minimumDFTFThreshold,
javax.swing.ProgressMonitor progressMonitor)
progressMonitor - show progress with a progressMonitor
public double reduceFeatureSpaceDimension(int fSD,
int minimumDFTFFrequency)
fSD - the dimensionality the feature space should have after the
reduction step
public java.lang.String classifyDocumentAndProvideReport(java.lang.String path,
int classifierIndex)
path - the path to the document to classify
public void trainKNNClassifier()
public void trainNaiveBayesClassifier()
public TCEvaluationResult evaluateClassification(int classifierIndex)
public TCAccuracyResult computeClassifierAccuracyKNN(double stepSize)
public TCAccuracyResult computeClassifierAccuracyNBB(double stepSize)
public TCEvaluationResult evaluateKNNClassificationForThreshold(double threshold,
boolean display)
public TCEvaluationResult evaluateNBBClassificationForThreshold(double threshold,
boolean display)
public void evaluateClassificationBatch()
public TCTableData fillBatchEvaluationDataTable()
public TCTableData fillCategoryDataTable()
public TCTableData fillDocumentDataTable(int property)
public TCTableData fillTermDataTable()
public Graph2D createGDGraph(int distributionIndex)
public Graph2D createPrecisionRecallGraph(int classifierIndex)
public Graph2D createEvaluationGraph(int evaluationNumber)
evaluationNumber - indicates the index in the batchevaluation arraylist
currently only one batch evaluation can be saved and loaded (index = 0)
public void saveBatchEvaluationResult(java.lang.String path,
int resultIndex)
throws java.io.IOException
path - the path of the file to save the batch evaluation result inresultIndex -
java.io.IOException
public void loadBatchEvaluationResult(java.lang.String path)
throws java.io.IOException,
java.lang.ClassNotFoundException
java.io.IOException
java.lang.ClassNotFoundException
public void saveClassificatorScoring(java.lang.String path,
int classifierIndex)
throws java.io.IOException
path - the path of the file to save the scoring inclassifierIndex - the index of the classifier the scoring is saved for
java.io.IOException
public void loadClassificatorScoring(java.lang.String path,
int classifierIndex)
throws java.io.IOException,
java.lang.ClassNotFoundException
path - the path of the file the scoring is loaded fromclassifierIndex - the index of the classifer for which the scoring is loaded
java.io.IOException
java.lang.ClassNotFoundException
public void exportDocumentVectors(java.lang.String path,
int documentType)
throws java.io.IOException
java.io.IOException
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||