Commit Graph

16 Commits

Author SHA1 Message Date
Roberto Rosario
d54fd98ec5 Finished adding language specific ocr cleanup code 2011-04-07 12:23:26 -04:00
Roberto Rosario
d1ff305a3f Initial commit for the ocr_cleanup branch 2011-04-07 04:07:59 -04:00
Roberto Rosario
f66c8ec6e2 Fixed error and some warning returned by pylint 2011-04-05 00:04:11 -04:00
Roberto Rosario
e4912a8d4d Close file descriptors to prevent memory leaks 2011-03-07 23:22:53 -04:00
Roberto Rosario
6a9e114acb Set all *.py files permissions to 644 2011-03-07 12:15:25 -04:00
Roberto Rosario
d0bea8ffeb Initial version of the GridFS storage driver 2011-03-04 01:08:20 -04:00
Roberto Rosario
c18cb099c6 Improved tesseract execution handling 2011-02-17 23:31:54 -04:00
Roberto Rosario
77b8a432a2 Added distributed OCR queue support 2011-02-17 04:37:35 -04:00
Roberto Rosario
478fb3502e Changed from python's multiprocessing to celery to handle concurrency 2011-02-17 03:45:30 -04:00
Roberto Rosario
409a52af95 First commit to support ocr subprocess 2011-02-17 01:57:14 -04:00
Roberto Rosario
dfd101c33b Cleanup file after ocr 2011-02-16 20:54:11 -04:00
Roberto Rosario
b1e2f64617 Apply transformation before doing OCR, added unpaper to the OCR pre processing pipe 2011-02-16 03:32:21 -04:00
Roberto Rosario
fbc8bc960a Decoupled page transformation interface, added default transformation support 2011-02-14 02:11:39 -04:00
Roberto Rosario
06d7e5a46a Added multipage document support and document page transformation 2011-02-14 00:18:16 -04:00
Roberto Rosario
d6afcc64bb Changed file permissions 2011-02-09 13:55:01 -04:00
Roberto Rosario
6569faad11 Added OCR capabilites 2011-02-09 02:12:14 -04:00