Commit Graph

27 Commits

Author SHA1 Message Date
Roberto Rosario
ea3b513ae3 Add new app to handle all dependencies
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2019-05-03 01:12:20 -04:00
Roberto Rosario
32cf0a0595 Add new default Tesseract OCR backend
This new backend uses a command call to avoid
Tesseract bug 1670
(https://github.com/tesseract-ocr/tesseract/issues/1670).

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2019-04-27 15:44:09 -04:00
Roberto Rosario
72311c73b5 Add workaround for Tesseract bug 1670
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2019-04-12 05:27:27 -04:00
Roberto Rosario
bcd2427ab6 Move the noop OCR backend to the right place.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-10-18 16:21:12 -04:00
Roberto Rosario
8e39016f12 Code cleanups.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-08-21 18:57:38 -04:00
Daniel Albert
8cea56aceb Fix string concatenation to fix error messages
Without using parentheses, the strings are not joined.
2018-07-02 20:57:45 +00:00
Roberto Rosario
20f7967241 Initialize PyOCR backend tool attribute to a sane default.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2017-03-22 03:14:21 -04:00
Roberto Rosario
86a602aa34 PEP8 cleanups. 2016-12-31 02:25:02 -04:00
Roberto Rosario
6bfdb053e3 Add new OCR backend using PyOCR. Remove current direct call Tesseract backend. 2016-12-30 00:36:45 -04:00
Roberto Rosario
8712c6ee37 PEP8 cleanups. 2016-05-17 05:08:21 -04:00
Roberto Rosario
27aae995f0 The Tesseract OCR backend now reports if the requested language file is missing. GitLab issue #289. 2016-05-17 03:49:23 -04:00
Roberto Rosario
3b728328ad PEP8 cleanups, E501. 2015-07-23 04:05:29 -04:00
Roberto Rosario
ba7cb433d4 Don't hide OCR errors by doing a fallback try without language option. gh-issue #211 2015-07-18 03:25:36 -04:00
Roberto Rosario
2033f85874 Log OCR subclass errors. 2015-07-08 04:16:35 -04:00
Roberto Rosario
48df3dcafa PEP8 cleanups 2015-06-24 17:11:24 -04:00
Roberto Rosario
e4623fadcd PEP8 cleanups 2015-06-23 02:23:23 -04:00
Roberto Rosario
78198f3398 Smart settings refactor 2015-06-22 21:04:06 -04:00
Roberto Rosario
5275061f9f Refactor OCR backend class to be file object based and use images from document page not the actual file. Use pytesseract instead of calling the CLI directly. 2015-06-09 03:28:38 -04:00
Roberto Rosario
e6754c9a6f Update the OCR app to work based on document versions not documents, document version are the module which hold the document pages instances. Remove old OCR document queue and replace with a single module for OCR processing error entries. Increase compatibility with Django 1.7 and Python 3. 2015-01-15 03:01:43 -04:00
Roberto Rosario
1f0f3adcba Make sure base class method arguments match 2014-11-03 00:16:22 -04:00
Roberto Rosario
654022807f Raise the correct exception 2014-11-02 20:16:09 -04:00
Roberto Rosario
5f6438ac55 Merge branch 'feature/polish_PR_52' into development 2014-09-30 15:48:42 -04:00
Roberto Rosario
4968051b6d Don't silence OCR errors even if Tesseract is optional otherwise the user won't know happened.
Catch the OSError generic exception and return a friendlier "Tesseract not found" exception
2014-09-30 15:41:00 -04:00
Roberto Rosario
b761037d99 Move all settings files from <app>/conf/settings.py to <app>/settings.py 2014-09-11 05:02:40 -04:00
Roberto Rosario
a9390d55ba Unify the way backends are defined and loaded, unify the fs_cleanup function 2014-07-01 00:22:31 -04:00
Roberto Rosario
ac061f2203 PEP8 Cleanups, simple sintax errors fixes 2014-06-25 02:53:12 -04:00
Roberto Rosario
e0347785b7 Initial implementation of OCR pluggable backends 2014-06-24 22:48:16 -04:00