Commit Graph

17 Commits

Author SHA1 Message Date
Roberto Rosario
32cf0a0595 Add new default Tesseract OCR backend
This new backend uses a command call to avoid
Tesseract bug 1670
(https://github.com/tesseract-ocr/tesseract/issues/1670).

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2019-04-27 15:44:09 -04:00
Roberto Rosario
74c97314d7 Code style cleanups
Add keyword arguments. Sort arguments and models.
Move literals to their own module. Prepend handler_ to
signal handlers.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2019-04-26 03:32:35 -04:00
Roberto Rosario
36a51eeb73 Switch to full app paths
Instead of inserting the path of the apps into the Python app,
the apps are now referenced by their full import path.

This solves name clashes with external or native Python libraries.
Example: Mayan statistics app vs. Python new statistics library.

Every app reference is now prepended with 'mayan.apps'.

Existing config.yml files need to be updated manually.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2019-04-05 02:02:57 -04:00
Roberto Rosario
c312a2a304 Remove the duplicated setting pdftotext_path from the OCR path. This is now handled by the document parsing app.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-09-01 02:12:08 -04:00
Roberto Rosario
bce5411ea7 Fix typos.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-04-10 21:22:25 -04:00
Roberto Rosario
a0b7561ed7 Add support for passing arguments to the OCR backend.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-04-05 17:23:32 -04:00
Roberto Rosario
6bfdb053e3 Add new OCR backend using PyOCR. Remove current direct call Tesseract backend. 2016-12-30 00:36:45 -04:00
Roberto Rosario
f59b96ac5e Update the document type auto ocr value to be defined at runtime by turning it into a setting. 2015-09-06 04:00:37 -04:00
Roberto Rosario
8382df91a6 Update PDF text parser classes. Remove SlateParser and substitute with a PDFMiner based parser. 2015-07-31 02:09:48 -04:00
Roberto Rosario
3b728328ad PEP8 cleanups, E501. 2015-07-23 04:05:29 -04:00
Roberto Rosario
4527563d89 PEP8 cleanups, specially E501 line too long. 2015-07-22 18:21:37 -04:00
Roberto Rosario
78198f3398 Smart settings refactor 2015-06-22 21:04:06 -04:00
Roberto Rosario
e6754c9a6f Update the OCR app to work based on document versions not documents, document version are the module which hold the document pages instances. Remove old OCR document queue and replace with a single module for OCR processing error entries. Increase compatibility with Django 1.7 and Python 3. 2015-01-15 03:01:43 -04:00
Roberto Rosario
e8762e4792 Issue #87, Per document language selection 2014-10-22 02:35:16 -04:00
Roberto Rosario
549f0fdc87 Issue #75, move OCR queueing from a setting to a DocumentType model field 2014-10-21 16:53:42 -04:00
Roberto Rosario
a613c65fde Update the OCR app to use Celery, remove OCR config options OCR_REPLICATION_DELAY, OCR_NODE_CONCURRENT_EXECUTION, OCR_QUEUE_PROCESSING_INTERVAL 2014-10-03 01:19:59 -04:00
Roberto Rosario
b761037d99 Move all settings files from <app>/conf/settings.py to <app>/settings.py 2014-09-11 05:02:40 -04:00