TextExtractorpublic class TextExtractor extends TestCase Tests that the extractor correctly gets the text out of our sample file |
Fields Summary |
---|
private PowerPointExtractor | ppeExtractor primed on the 2 page basic test data | private PowerPointExtractor | ppe2Extractor primed on the 1 page but text-box'd test data | private String | dirnameWhere to go looking for our test files |
Constructors Summary |
---|
public TextExtractor()
dirname = System.getProperty("HSLF.testdata.path");
String filename = dirname + "/basic_test_ppt_file.ppt";
ppe = new PowerPointExtractor(filename);
String filename2 = dirname + "/with_textbox.ppt";
ppe2 = new PowerPointExtractor(filename2);
|
Methods Summary |
---|
private void | ensureTwoStringsTheSame(java.lang.String exp, java.lang.String act)
assertEquals(exp.length(),act.length());
char[] expC = exp.toCharArray();
char[] actC = act.toCharArray();
for(int i=0; i<expC.length; i++) {
System.out.println(i + "\t" + expC[i] + " " + actC[i]);
assertEquals(expC[i],actC[i]);
}
assertEquals(exp,act);
| public void | testMissingCoreRecords()Test that when presented with a PPT file missing the odd
core record, we can still get the rest of the text out
String filename = dirname + "/missing_core_records.ppt";
ppe = new PowerPointExtractor(filename);
String text = ppe.getText(true, false);
String nText = ppe.getNotes();
assertNotNull(text);
assertNotNull(nText);
// Notes record were corrupt, so don't expect any
assertEquals(nText.length(), 0);
// Slide records were fine
assertTrue(text.startsWith("Using Disease Surveillance and Response"));
| public void | testReadNoteText()
// Basic 2 page example
String notesText = ppe.getNotes();
String expectText = "These are the notes for page 1\nThese are the notes on page two, again lacking formatting\n";
ensureTwoStringsTheSame(expectText, notesText);
// Other one doesn't have notes
notesText = ppe2.getNotes();
expectText = "";
ensureTwoStringsTheSame(expectText, notesText);
| public void | testReadSheetText()
// Basic 2 page example
String sheetText = ppe.getText();
String expectText = "This is a test title\nThis is a test subtitle\nThis is on page 1\nThis is the title on page 2\nThis is page two\nIt has several blocks of text\nNone of them have formatting\n";
ensureTwoStringsTheSame(expectText, sheetText);
// 1 page example with text boxes
sheetText = ppe2.getText();
expectText = "Hello, World!!!\nI am just a poor boy\nThis is Times New Roman\nPlain Text \n";
ensureTwoStringsTheSame(expectText, sheetText);
|
|