Commit Graph

38 Commits

Author SHA1 Message Date
Roberto Rosario
486f983d4b Refactor job processing app to do actual job queue and job subprocess launching, remove queue mananger app, update ocr app to use new job processing app 2012-07-30 07:36:02 -04:00
Roberto Rosario
d2e6df4dde Initial changes for the new queue based OCR processing 2012-07-29 05:33:04 -04:00
Roberto Rosario
34311fb17e Cleanups, permissions separation into explicit module, absolute import update 2012-01-02 03:48:26 -04:00
Roberto Rosario
f91f5fd70f Fix get_image_cache_name regression in ocr app 2011-12-17 23:24:13 -04:00
Roberto Rosario
d83e8b5428 Initial set of model, form and API changes to support document versions 2011-12-02 02:51:59 -04:00
Roberto Rosario
deb09d3d83 Re enabled tesseract language specific OCR processing and added a 1 time language neutral retry for failed language specific OCR 2011-11-22 17:46:18 -04:00
Roberto Rosario
f0c019f6fc Reduce severity of the messages displayed when no OCR backend is found for a language 2011-11-06 01:06:43 -04:00
Roberto Rosario
bcb61c3ca3 Enabled OCR queue transformation processing 2011-07-25 03:40:15 -04:00
Roberto Rosario
90e876ca93 Code cleanup 2011-07-21 11:46:15 -04:00
Roberto Rosario
89fc258a59 Adapter the OCR app to the new pre cache and preview generation methods 2011-07-21 03:49:27 -04:00
Roberto Rosario
8579c5081d Improved OCR file conversion 2011-07-19 20:56:21 -04:00
Roberto Rosario
648be556a6 Finished adapting the OCR app to the new transformations refactor 2011-07-19 04:21:36 -04:00
Roberto Rosario
5bfd607b31 Removed pdftotext from the requirements, move unpaper calling to the OCR app 2011-07-18 04:06:19 -04:00
Roberto Rosario
5829bbde4d Added per OCR queue transformation models and CRUD views to replace the CONVERTER_OCR_OPTIONS with the new refactored converter transformations systems 2011-07-17 01:32:46 -04:00
Roberto Rosario
0a5dfd6aa9 Plug file descriptor leak 2011-05-19 22:55:57 -04:00
Roberto Rosario
07e9b12e78 flake8 cleanups, ununsed imports and variables cleanup, changed register_diagnostics to use reverse_lazy instead of reverse 2011-05-06 10:39:54 -04:00
Roberto Rosario
ae35e89549 Unicode updates 2011-05-03 21:11:35 -04:00
Roberto Rosario
1e0d8d1f25 Added doctring description 2011-05-03 20:58:58 -04:00
Roberto Rosario
2a744cefea PEP8, pylint cleanups and removal of relative imports 2011-04-23 02:49:07 -04:00
Roberto Rosario
eaaaa5b645 Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR 2011-04-15 23:59:52 -04:00
Roberto Rosario
6b67cff5d7 Changed the way document page count is parsed from the graphics backend, fixing issue #7 2011-04-08 03:29:48 -04:00
Roberto Rosario
71a3c218f4 PEP8, pylint and django-lint cleanups 2011-04-08 02:09:39 -04:00
Roberto Rosario
d54fd98ec5 Finished adding language specific ocr cleanup code 2011-04-07 12:23:26 -04:00
Roberto Rosario
d1ff305a3f Initial commit for the ocr_cleanup branch 2011-04-07 04:07:59 -04:00
Roberto Rosario
f66c8ec6e2 Fixed error and some warning returned by pylint 2011-04-05 00:04:11 -04:00
Roberto Rosario
e4912a8d4d Close file descriptors to prevent memory leaks 2011-03-07 23:22:53 -04:00
Roberto Rosario
6a9e114acb Set all *.py files permissions to 644 2011-03-07 12:15:25 -04:00
Roberto Rosario
d0bea8ffeb Initial version of the GridFS storage driver 2011-03-04 01:08:20 -04:00
Roberto Rosario
c18cb099c6 Improved tesseract execution handling 2011-02-17 23:31:54 -04:00
Roberto Rosario
77b8a432a2 Added distributed OCR queue support 2011-02-17 04:37:35 -04:00
Roberto Rosario
478fb3502e Changed from python's multiprocessing to celery to handle concurrency 2011-02-17 03:45:30 -04:00
Roberto Rosario
409a52af95 First commit to support ocr subprocess 2011-02-17 01:57:14 -04:00
Roberto Rosario
dfd101c33b Cleanup file after ocr 2011-02-16 20:54:11 -04:00
Roberto Rosario
b1e2f64617 Apply transformation before doing OCR, added unpaper to the OCR pre processing pipe 2011-02-16 03:32:21 -04:00
Roberto Rosario
fbc8bc960a Decoupled page transformation interface, added default transformation support 2011-02-14 02:11:39 -04:00
Roberto Rosario
06d7e5a46a Added multipage document support and document page transformation 2011-02-14 00:18:16 -04:00
Roberto Rosario
d6afcc64bb Changed file permissions 2011-02-09 13:55:01 -04:00
Roberto Rosario
6569faad11 Added OCR capabilites 2011-02-09 02:12:14 -04:00