Roberto Rosario
|
6f585a2836
|
Update and re-enable ocr app
|
2012-09-16 03:30:32 -04:00 |
|
Roberto Rosario
|
babdc4e93a
|
Initial changes to update the OCR app
|
2012-09-10 23:30:13 -04:00 |
|
Roberto Rosario
|
58019de21b
|
Don't pass mimetype to render_to_viewport method
|
2012-08-15 01:34:29 -04:00 |
|
Roberto Rosario
|
576a2cc643
|
Support passing MIMETypes and actual document filenames to TextParser for better lexer guessing
|
2012-08-06 03:00:09 -04:00 |
|
Roberto Rosario
|
f77c886e51
|
Register TextParser for OCR based on the list of MIME type it supports
|
2012-07-28 04:45:45 -04:00 |
|
Roberto Rosario
|
0ec1cc3823
|
Add text parser and render using Pygments
|
2012-07-28 02:22:45 -04:00 |
|
Roberto Rosario
|
58f027db60
|
Clean up (unused imports, PEP8, etc)
|
2012-06-08 16:43:54 -04:00 |
|
Roberto Rosario
|
2849fd6e79
|
Detect blank pages with the PopplerParser, raise ParserError to fallback to OCR if all parsers fail
|
2012-06-03 21:08:22 -04:00 |
|
Roberto Rosario
|
d1ccca4d2e
|
Final updates for the PopplerParser
|
2012-05-30 16:15:57 -04:00 |
|
Roberto Rosario
|
babd3ec2f3
|
Refacto parser system to be class based, add poppler based PDF parser, allow multiple parsers for each mimetype with fallback
|
2012-05-30 12:57:25 -04:00 |
|
Roberto Rosario
|
f9a3c4611b
|
PEP8 cleanups, remove OCR_CACHE_URI
|
2012-01-18 13:53:02 -04:00 |
|
Roberto Rosario
|
1e38369919
|
Update parser to use the latest version of a document when extracting text
|
2011-12-02 05:56:34 -04:00 |
|
Roberto Rosario
|
922971274f
|
Add office document text extractor
|
2011-12-01 04:54:14 -04:00 |
|
Roberto Rosario
|
90e876ca93
|
Code cleanup
|
2011-07-21 11:46:15 -04:00 |
|
Roberto Rosario
|
d566dfbb1d
|
Added the first text parser backend (PDF) and updated the requirements files and README
|
2011-07-18 04:06:59 -04:00 |
|