Roberto Rosario
|
f0c019f6fc
|
Reduce severity of the messages displayed when no OCR backend is found for a language
|
2011-11-06 01:06:43 -04:00 |
|
Roberto Rosario
|
bcb61c3ca3
|
Enabled OCR queue transformation processing
|
2011-07-25 03:40:15 -04:00 |
|
Roberto Rosario
|
90e876ca93
|
Code cleanup
|
2011-07-21 11:46:15 -04:00 |
|
Roberto Rosario
|
89fc258a59
|
Adapter the OCR app to the new pre cache and preview generation methods
|
2011-07-21 03:49:27 -04:00 |
|
Roberto Rosario
|
8579c5081d
|
Improved OCR file conversion
|
2011-07-19 20:56:21 -04:00 |
|
Roberto Rosario
|
648be556a6
|
Finished adapting the OCR app to the new transformations refactor
|
2011-07-19 04:21:36 -04:00 |
|
Roberto Rosario
|
5bfd607b31
|
Removed pdftotext from the requirements, move unpaper calling to the OCR app
|
2011-07-18 04:06:19 -04:00 |
|
Roberto Rosario
|
5829bbde4d
|
Added per OCR queue transformation models and CRUD views to replace the CONVERTER_OCR_OPTIONS with the new refactored converter transformations systems
|
2011-07-17 01:32:46 -04:00 |
|
Roberto Rosario
|
0a5dfd6aa9
|
Plug file descriptor leak
|
2011-05-19 22:55:57 -04:00 |
|
Roberto Rosario
|
07e9b12e78
|
flake8 cleanups, ununsed imports and variables cleanup, changed register_diagnostics to use reverse_lazy instead of reverse
|
2011-05-06 10:39:54 -04:00 |
|
Roberto Rosario
|
ae35e89549
|
Unicode updates
|
2011-05-03 21:11:35 -04:00 |
|
Roberto Rosario
|
1e0d8d1f25
|
Added doctring description
|
2011-05-03 20:58:58 -04:00 |
|
Roberto Rosario
|
2a744cefea
|
PEP8, pylint cleanups and removal of relative imports
|
2011-04-23 02:49:07 -04:00 |
|
Roberto Rosario
|
eaaaa5b645
|
Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR
|
2011-04-15 23:59:52 -04:00 |
|
Roberto Rosario
|
6b67cff5d7
|
Changed the way document page count is parsed from the graphics backend, fixing issue #7
|
2011-04-08 03:29:48 -04:00 |
|
Roberto Rosario
|
71a3c218f4
|
PEP8, pylint and django-lint cleanups
|
2011-04-08 02:09:39 -04:00 |
|
Roberto Rosario
|
d54fd98ec5
|
Finished adding language specific ocr cleanup code
|
2011-04-07 12:23:26 -04:00 |
|
Roberto Rosario
|
d1ff305a3f
|
Initial commit for the ocr_cleanup branch
|
2011-04-07 04:07:59 -04:00 |
|
Roberto Rosario
|
f66c8ec6e2
|
Fixed error and some warning returned by pylint
|
2011-04-05 00:04:11 -04:00 |
|
Roberto Rosario
|
e4912a8d4d
|
Close file descriptors to prevent memory leaks
|
2011-03-07 23:22:53 -04:00 |
|
Roberto Rosario
|
6a9e114acb
|
Set all *.py files permissions to 644
|
2011-03-07 12:15:25 -04:00 |
|
Roberto Rosario
|
d0bea8ffeb
|
Initial version of the GridFS storage driver
|
2011-03-04 01:08:20 -04:00 |
|
Roberto Rosario
|
c18cb099c6
|
Improved tesseract execution handling
|
2011-02-17 23:31:54 -04:00 |
|
Roberto Rosario
|
77b8a432a2
|
Added distributed OCR queue support
|
2011-02-17 04:37:35 -04:00 |
|
Roberto Rosario
|
478fb3502e
|
Changed from python's multiprocessing to celery to handle concurrency
|
2011-02-17 03:45:30 -04:00 |
|
Roberto Rosario
|
409a52af95
|
First commit to support ocr subprocess
|
2011-02-17 01:57:14 -04:00 |
|
Roberto Rosario
|
dfd101c33b
|
Cleanup file after ocr
|
2011-02-16 20:54:11 -04:00 |
|
Roberto Rosario
|
b1e2f64617
|
Apply transformation before doing OCR, added unpaper to the OCR pre processing pipe
|
2011-02-16 03:32:21 -04:00 |
|
Roberto Rosario
|
fbc8bc960a
|
Decoupled page transformation interface, added default transformation support
|
2011-02-14 02:11:39 -04:00 |
|
Roberto Rosario
|
06d7e5a46a
|
Added multipage document support and document page transformation
|
2011-02-14 00:18:16 -04:00 |
|
Roberto Rosario
|
d6afcc64bb
|
Changed file permissions
|
2011-02-09 13:55:01 -04:00 |
|
Roberto Rosario
|
6569faad11
|
Added OCR capabilites
|
2011-02-09 02:12:14 -04:00 |
|