Commit Graph

33 Commits

Author SHA1 Message Date
Roberto Rosario
317d07a355 Refactor OCR app. Removes document parsing. Moves OCR processing to
model manager. Add submit and finish events.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2017-08-23 02:04:57 -04:00
Roberto Rosario
79b0763fe9 Cleanups 2016-12-22 03:15:32 -04:00
Roberto Rosario
44531bd92a Add plugable locking backend support. Add threadsafe file lock backend. 2016-11-13 03:50:09 -04:00
Roberto Rosario
ad328b2c3b Silence lock manager model import warning. 2016-03-08 02:32:03 -04:00
Roberto Rosario
079c06c207 Don't reference document_version in error messages as if might not exists yet. 2015-09-19 22:07:52 -04:00
Roberto Rosario
bec85f38f4 Text parsers and OCR backends are now used in tandem for each document. 2015-08-08 04:49:08 -04:00
Roberto Rosario
3b728328ad PEP8 cleanups, E501. 2015-07-23 04:05:29 -04:00
Roberto Rosario
4527563d89 PEP8 cleanups, specially E501 line too long. 2015-07-22 18:21:37 -04:00
Roberto Rosario
8b608452a5 Lower severity of operational error during OCR. 2015-07-11 17:01:03 -04:00
Roberto Rosario
0238be7a18 Add support for retrying upload queue and ocr queue tasks in the event of Database locking errors. 2015-07-11 16:19:04 -04:00
Roberto Rosario
5275061f9f Refactor OCR backend class to be file object based and use images from document page not the actual file. Use pytesseract instead of calling the CLI directly. 2015-06-09 03:28:38 -04:00
Roberto Rosario
0bd6bd7930 Add missing task instance to task_do_ocr task 2015-06-02 19:36:19 -04:00
Roberto Rosario
8176326a16 Add new post_document_version_ocr signal 2015-06-02 00:25:46 -04:00
Roberto Rosario
e6754c9a6f Update the OCR app to work based on document versions not documents, document version are the module which hold the document pages instances. Remove old OCR document queue and replace with a single module for OCR processing error entries. Increase compatibility with Django 1.7 and Python 3. 2015-01-15 03:01:43 -04:00
Roberto Rosario
fafd84b8d2 Move magic number variable to the literls.py module 2015-01-14 18:47:31 -04:00
Roberto Rosario
ba1729106f Pass arguments to the logger the correct way 2014-11-02 20:55:21 -04:00
Roberto Rosario
83f4d90fa3 Don't store OCR error log if no document was created 2014-10-29 05:51:23 -04:00
Roberto Rosario
0d38a1239d PEP8 cleanups 2014-10-18 00:55:55 -04:00
Roberto Rosario
36694cc5c5 Set the ignore_result option for the tasks that don't return values 2014-10-11 02:05:40 -04:00
Roberto Rosario
c2e35694d8 Unify the method to submit document for OCR, fix OCR error document re-queue view 2014-10-09 14:08:48 -04:00
Roberto Rosario
8bac1525be PEP8 cleanups 2014-10-08 19:39:16 -04:00
Roberto Rosario
b73ad4ad0c Issue #57, Remove the scheduler and job_processor apps 2014-10-03 02:04:10 -04:00
Roberto Rosario
a613c65fde Update the OCR app to use Celery, remove OCR config options OCR_REPLICATION_DELAY, OCR_NODE_CONCURRENT_EXECUTION, OCR_QUEUE_PROCESSING_INTERVAL 2014-10-03 01:19:59 -04:00
Roberto Rosario
74cf4c413f Import cleanups, reorganization, PEP8 cleanups 2014-10-02 02:01:08 -04:00
Roberto Rosario
b761037d99 Move all settings files from <app>/conf/settings.py to <app>/settings.py 2014-09-11 05:02:40 -04:00
Roberto Rosario
6b169b4526 Modernize exception handling, and improves Python 3.x compatibility 2014-07-20 16:42:30 -04:00
Roberto Rosario
04f616ffaf Only store the full OCR error stack trace when in DEBUG mode 2014-07-02 17:44:22 -04:00
Roberto Rosario
da785c343c Lower the lock timeout period, store the entire stack trace when OCR fails 2014-07-01 00:23:18 -04:00
Roberto Rosario
6a659741af PEP8 cleanups 2014-06-30 00:57:53 -04:00
Roberto Rosario
d640eacec8 Update usage of datetime.now to Django timezone aware now() 2014-06-29 17:00:07 -04:00
Roberto Rosario
ac061f2203 PEP8 Cleanups, simple sintax errors fixes 2014-06-25 02:53:12 -04:00
Roberto Rosario
eda2592106 Cleanup import, quoting style 2014-06-20 19:59:33 -04:00
Roberto Rosario
ec1745b50b Initial changes to support the new Django 1.6 project structure 2014-06-15 13:13:21 +02:00