Roberto Rosario
|
d54fd98ec5
|
Finished adding language specific ocr cleanup code
|
2011-04-07 12:23:26 -04:00 |
|
Roberto Rosario
|
d1ff305a3f
|
Initial commit for the ocr_cleanup branch
|
2011-04-07 04:07:59 -04:00 |
|
Roberto Rosario
|
f66c8ec6e2
|
Fixed error and some warning returned by pylint
|
2011-04-05 00:04:11 -04:00 |
|
Roberto Rosario
|
e4912a8d4d
|
Close file descriptors to prevent memory leaks
|
2011-03-07 23:22:53 -04:00 |
|
Roberto Rosario
|
6a9e114acb
|
Set all *.py files permissions to 644
|
2011-03-07 12:15:25 -04:00 |
|
Roberto Rosario
|
d0bea8ffeb
|
Initial version of the GridFS storage driver
|
2011-03-04 01:08:20 -04:00 |
|
Roberto Rosario
|
c18cb099c6
|
Improved tesseract execution handling
|
2011-02-17 23:31:54 -04:00 |
|
Roberto Rosario
|
77b8a432a2
|
Added distributed OCR queue support
|
2011-02-17 04:37:35 -04:00 |
|
Roberto Rosario
|
478fb3502e
|
Changed from python's multiprocessing to celery to handle concurrency
|
2011-02-17 03:45:30 -04:00 |
|
Roberto Rosario
|
409a52af95
|
First commit to support ocr subprocess
|
2011-02-17 01:57:14 -04:00 |
|
Roberto Rosario
|
dfd101c33b
|
Cleanup file after ocr
|
2011-02-16 20:54:11 -04:00 |
|
Roberto Rosario
|
b1e2f64617
|
Apply transformation before doing OCR, added unpaper to the OCR pre processing pipe
|
2011-02-16 03:32:21 -04:00 |
|
Roberto Rosario
|
fbc8bc960a
|
Decoupled page transformation interface, added default transformation support
|
2011-02-14 02:11:39 -04:00 |
|
Roberto Rosario
|
06d7e5a46a
|
Added multipage document support and document page transformation
|
2011-02-14 00:18:16 -04:00 |
|
Roberto Rosario
|
d6afcc64bb
|
Changed file permissions
|
2011-02-09 13:55:01 -04:00 |
|
Roberto Rosario
|
6569faad11
|
Added OCR capabilites
|
2011-02-09 02:12:14 -04:00 |
|