Roberto Rosario
|
ebdcede59f
|
Made the queue processing interval configurable by means of a new setting: OCR_QUEUE_PROCESSING_INTERVAL
|
2011-04-23 05:38:59 -04:00 |
|
Roberto Rosario
|
eaaaa5b645
|
Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR
|
2011-04-15 23:59:52 -04:00 |
|
Roberto Rosario
|
6b5a17af39
|
Made English the default language for Tesseract if none is specified
|
2011-04-13 03:25:45 -04:00 |
|
Roberto Rosario
|
71a3c218f4
|
PEP8, pylint and django-lint cleanups
|
2011-04-08 02:09:39 -04:00 |
|
Roberto Rosario
|
283df926d1
|
Made automatic OCR a function of the OCR app and not of Documents app (via signals)
Renamed setup option DOCUMENT_AUTOMATIC_OCR to OCR_AUTOMATIC_OCR
|
2011-04-04 15:36:00 -04:00 |
|
Roberto Rosario
|
3cb0f37b5b
|
Made the concurrent ocr code more granular, per node, every node can handle different amounts of concurrent ocr tasks
|
2011-03-22 04:17:48 -04:00 |
|
Roberto Rosario
|
f9ab61647e
|
Reduced default delay time
|
2011-03-22 03:43:18 -04:00 |
|
Roberto Rosario
|
bbcc0ead65
|
* Added a new option OCR_REPLICATION_DELAY to allow the storage some time for replication before attempting to do OCR to a document
|
2011-03-21 12:24:42 -04:00 |
|
Roberto Rosario
|
6a9e114acb
|
Set all *.py files permissions to 644
|
2011-03-07 12:15:25 -04:00 |
|
Roberto Rosario
|
595d7227a2
|
Added navigation link from document page view and document page transformation back to document view
|
2011-02-17 23:27:25 -04:00 |
|
Roberto Rosario
|
478fb3502e
|
Changed from python's multiprocessing to celery to handle concurrency
|
2011-02-17 03:45:30 -04:00 |
|
Roberto Rosario
|
d6afcc64bb
|
Changed file permissions
|
2011-02-09 13:55:01 -04:00 |
|
Roberto Rosario
|
6569faad11
|
Added OCR capabilites
|
2011-02-09 02:12:14 -04:00 |
|