eaaaa5b645
Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR
Roberto Rosario
2011-04-15 23:59:52 -04:00
73a52293e8
Order DocumentPage mode by page_number field
Roberto Rosario
2011-04-15 04:24:23 -04:00
c3470f2d8b
Missed two changes related to the last commit
Roberto Rosario
2011-04-11 10:25:48 -04:00
d855280a18
Made AVAILABLE_INDEXING_FUNCTION setting a setting of the documents app instead of the filesystem_serving app
Roberto Rosario
2011-04-11 10:19:10 -04:00
f66c8ec6e2
Fixed error and some warning returned by pylint
Roberto Rosario
2011-04-05 00:04:11 -04:00
283df926d1
Made automatic OCR a function of the OCR app and not of Documents app (via signals) Renamed setup option DOCUMENT_AUTOMATIC_OCR to OCR_AUTOMATIC_OCR
Roberto Rosario
2011-04-04 15:36:00 -04:00
664ece7a60
Added a new setup option: FILESYSTEM_INDEXING_AVAILABLE_FUNCTIONS - a dictionary to allow users to add custom functions
Roberto Rosario
2011-04-04 14:58:36 -04:00
fcc8b0cfe4
Updated TODO
Roberto Rosario
2011-04-04 14:56:28 -04:00
1d48325a92
Clear node name when requeueing a document for OCR
Roberto Rosario
2011-04-04 09:24:25 -04:00
8f82c82825
Added small optimizations
Roberto Rosario
2011-03-24 14:19:17 -04:00
69f2cb1fa4
Removed the 'exists' column in document list view, this diagnostics superceded this
Roberto Rosario
2011-03-24 13:19:16 -04:00
f417344758
Introduce a random delay to each node to further reduce the chance of a race condition, until row locking can be implemented or is implemented by Django
Roberto Rosario
2011-03-23 17:03:00 -04:00
f97c43b243
Updated TODO
Roberto Rosario
2011-03-23 16:46:38 -04:00
9765a7f607
Added an additional check to lower the chance of OCR race conditions between nodes
Roberto Rosario
2011-03-23 16:45:49 -04:00
1e66c77cf6
Invalid page numbers now raise Http404, not found
Roberto Rosario
2011-03-23 16:23:00 -04:00
4589cd4506
Updated TODO
Roberto Rosario
2011-03-22 04:27:22 -04:00
3cb0f37b5b
Made the concurrent ocr code more granular, per node, every node can handle different amounts of concurrent ocr tasks
Roberto Rosario
2011-03-22 04:17:48 -04:00
d0942a203b
Reimplemented OCR delay code, only delay new document
Roberto Rosario
2011-03-22 03:46:34 -04:00
f9ab61647e
Reduced default delay time
Roberto Rosario
2011-03-22 03:43:18 -04:00
800dc28938
Small optimization
Roberto Rosario
2011-03-22 03:42:21 -04:00
8f6fac0bae
Removed old disabled code
Roberto Rosario
2011-03-22 03:38:58 -04:00
75324ce581
Disabled single OCR document action as multple actions are now enabled by default
Roberto Rosario
2011-03-21 16:32:01 -04:00
5d9302e583
Added multi ocr queued document delete support
Roberto Rosario
2011-03-21 16:29:04 -04:00
bef40d958e
Added OCR multi document re-queue support
Roberto Rosario
2011-03-21 16:19:19 -04:00
bbcc0ead65
* Added a new option OCR_REPLICATION_DELAY to allow the storage some time for replication before attempting to do OCR to a document
Roberto Rosario
2011-03-21 12:24:42 -04:00