Roberto Rosario
32cf0a0595
Add new default Tesseract OCR backend
...
This new backend uses a command call to avoid
Tesseract bug 1670
(https://github.com/tesseract-ocr/tesseract/issues/1670 ).
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com >
2019-04-27 15:44:09 -04:00
Roberto Rosario
74c97314d7
Code style cleanups
...
Add keyword arguments. Sort arguments and models.
Move literals to their own module. Prepend handler_ to
signal handlers.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com >
2019-04-26 03:32:35 -04:00
Roberto Rosario
36a51eeb73
Switch to full app paths
...
Instead of inserting the path of the apps into the Python app,
the apps are now referenced by their full import path.
This solves name clashes with external or native Python libraries.
Example: Mayan statistics app vs. Python new statistics library.
Every app reference is now prepended with 'mayan.apps'.
Existing config.yml files need to be updated manually.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com >
2019-04-05 02:02:57 -04:00
Roberto Rosario
c312a2a304
Remove the duplicated setting pdftotext_path from the OCR path. This is now handled by the document parsing app.
...
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com >
2018-09-01 02:12:08 -04:00
Roberto Rosario
bce5411ea7
Fix typos.
...
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com >
2018-04-10 21:22:25 -04:00
Roberto Rosario
a0b7561ed7
Add support for passing arguments to the OCR backend.
...
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com >
2018-04-05 17:23:32 -04:00
Roberto Rosario
6bfdb053e3
Add new OCR backend using PyOCR. Remove current direct call Tesseract backend.
2016-12-30 00:36:45 -04:00
Roberto Rosario
f59b96ac5e
Update the document type auto ocr value to be defined at runtime by turning it into a setting.
2015-09-06 04:00:37 -04:00
Roberto Rosario
8382df91a6
Update PDF text parser classes. Remove SlateParser and substitute with a PDFMiner based parser.
2015-07-31 02:09:48 -04:00
Roberto Rosario
3b728328ad
PEP8 cleanups, E501.
2015-07-23 04:05:29 -04:00
Roberto Rosario
4527563d89
PEP8 cleanups, specially E501 line too long.
2015-07-22 18:21:37 -04:00
Roberto Rosario
78198f3398
Smart settings refactor
2015-06-22 21:04:06 -04:00
Roberto Rosario
e6754c9a6f
Update the OCR app to work based on document versions not documents, document version are the module which hold the document pages instances. Remove old OCR document queue and replace with a single module for OCR processing error entries. Increase compatibility with Django 1.7 and Python 3.
2015-01-15 03:01:43 -04:00
Roberto Rosario
e8762e4792
Issue #87 , Per document language selection
2014-10-22 02:35:16 -04:00
Roberto Rosario
549f0fdc87
Issue #75 , move OCR queueing from a setting to a DocumentType model field
2014-10-21 16:53:42 -04:00
Roberto Rosario
a613c65fde
Update the OCR app to use Celery, remove OCR config options OCR_REPLICATION_DELAY, OCR_NODE_CONCURRENT_EXECUTION, OCR_QUEUE_PROCESSING_INTERVAL
2014-10-03 01:19:59 -04:00
Roberto Rosario
b761037d99
Move all settings files from <app>/conf/settings.py to <app>/settings.py
2014-09-11 05:02:40 -04:00