Commit Graph

7 Commits

Author SHA1 Message Date
Roberto Rosario
aaf9f7a8be OCR: Add 'ocr_content' attribute
Add the 'ocr_content' attribute to documents to allow access
to a document's OCR content for indexing and other purposes.

Fixes the OCR indexing failing test.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-11-27 05:20:31 -04:00
Roberto Rosario
85926ae8f8 The conditional_escape call caused downloaded OCR text to contain HTML entities like &quot;
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-06-28 02:04:49 -04:00
Roberto Rosario
26f6152356 Add "ocr_content" accessor to the DocumentVersion class to return
the ocr content.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2017-08-25 02:07:58 -04:00
Roberto Rosario
317d07a355 Refactor OCR app. Removes document parsing. Moves OCR processing to
model manager. Add submit and finish events.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2017-08-23 02:04:57 -04:00
Roberto Rosario
4096b8b882 PEP8 cleanups.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2017-07-24 20:30:46 -04:00
Roberto Rosario
6c6ca38374 Replace all instances of unicode only handling to use force_text.
Replace all __unicode__ methods to __str__ and the
@python_2_unicode_compatible decorator.
Replace all instance of smart_str, smart_unicode, force_uncode
with force_text.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2017-07-05 15:03:24 -04:00
Roberto Rosario
916c3497c4 Add support for downloading a document's OCR text.
Closes GitLab issue #215.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2017-07-01 01:07:23 -04:00