Commit Graph

15 Commits

Author SHA1 Message Date
Roberto Rosario
6f585a2836 Update and re-enable ocr app 2012-09-16 03:30:32 -04:00
Roberto Rosario
babdc4e93a Initial changes to update the OCR app 2012-09-10 23:30:13 -04:00
Roberto Rosario
58019de21b Don't pass mimetype to render_to_viewport method 2012-08-15 01:34:29 -04:00
Roberto Rosario
576a2cc643 Support passing MIMETypes and actual document filenames to TextParser for better lexer guessing 2012-08-06 03:00:09 -04:00
Roberto Rosario
f77c886e51 Register TextParser for OCR based on the list of MIME type it supports 2012-07-28 04:45:45 -04:00
Roberto Rosario
0ec1cc3823 Add text parser and render using Pygments 2012-07-28 02:22:45 -04:00
Roberto Rosario
58f027db60 Clean up (unused imports, PEP8, etc) 2012-06-08 16:43:54 -04:00
Roberto Rosario
2849fd6e79 Detect blank pages with the PopplerParser, raise ParserError to fallback to OCR if all parsers fail 2012-06-03 21:08:22 -04:00
Roberto Rosario
d1ccca4d2e Final updates for the PopplerParser 2012-05-30 16:15:57 -04:00
Roberto Rosario
babd3ec2f3 Refacto parser system to be class based, add poppler based PDF parser, allow multiple parsers for each mimetype with fallback 2012-05-30 12:57:25 -04:00
Roberto Rosario
f9a3c4611b PEP8 cleanups, remove OCR_CACHE_URI 2012-01-18 13:53:02 -04:00
Roberto Rosario
1e38369919 Update parser to use the latest version of a document when extracting text 2011-12-02 05:56:34 -04:00
Roberto Rosario
922971274f Add office document text extractor 2011-12-01 04:54:14 -04:00
Roberto Rosario
90e876ca93 Code cleanup 2011-07-21 11:46:15 -04:00
Roberto Rosario
d566dfbb1d Added the first text parser backend (PDF) and updated the requirements files and README 2011-07-18 04:06:59 -04:00