Commit Graph

79 Commits

Author SHA1 Message Date
Roberto Rosario
eaaaa5b645 Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR 2011-04-15 23:59:52 -04:00
Roberto Rosario
f87beff00e Fixed Non-ASCII character error in the English OCR cleanup backend 2011-04-13 03:26:55 -04:00
Roberto Rosario
6b5a17af39 Made English the default language for Tesseract if none is specified 2011-04-13 03:25:45 -04:00
Roberto Rosario
6b67cff5d7 Changed the way document page count is parsed from the graphics backend, fixing issue #7 2011-04-08 03:29:48 -04:00
Roberto Rosario
71a3c218f4 PEP8, pylint and django-lint cleanups 2011-04-08 02:09:39 -04:00
Roberto Rosario
d54fd98ec5 Finished adding language specific ocr cleanup code 2011-04-07 12:23:26 -04:00
Roberto Rosario
d1ff305a3f Initial commit for the ocr_cleanup branch 2011-04-07 04:07:59 -04:00
Roberto Rosario
f66c8ec6e2 Fixed error and some warning returned by pylint 2011-04-05 00:04:11 -04:00
Roberto Rosario
283df926d1 Made automatic OCR a function of the OCR app and not of Documents app (via signals)
Renamed setup option DOCUMENT_AUTOMATIC_OCR to OCR_AUTOMATIC_OCR
2011-04-04 15:36:00 -04:00
Roberto Rosario
1d48325a92 Clear node name when requeueing a document for OCR 2011-04-04 09:24:25 -04:00
Roberto Rosario
c2ba7eaf1d Spanish translation updates 2011-04-01 02:45:27 -04:00
Roberto Rosario
604cd60255 Clear last ocr results when requeueing a document 2011-03-25 16:37:30 -04:00
Roberto Rosario
f417344758 Introduce a random delay to each node to further reduce the chance of a race condition, until row locking can be implemented or is implemented by Django 2011-03-23 17:03:00 -04:00
Roberto Rosario
9765a7f607 Added an additional check to lower the chance of OCR race conditions between nodes 2011-03-23 16:45:49 -04:00
Roberto Rosario
a3fbe7f896 Allow OCR requeue of pending documents 2011-03-23 15:45:50 -04:00
Roberto Rosario
0f1526f3d8 Allow deletion of non existing documents from OCR queue 2011-03-23 09:51:54 -04:00
Roberto Rosario
3cb0f37b5b Made the concurrent ocr code more granular, per node, every node can handle different amounts of concurrent ocr tasks 2011-03-22 04:17:48 -04:00
Roberto Rosario
d0942a203b Reimplemented OCR delay code, only delay new document 2011-03-22 03:46:34 -04:00
Roberto Rosario
f9ab61647e Reduced default delay time 2011-03-22 03:43:18 -04:00
Roberto Rosario
70e5e4c470 Moved navigation code to its own app 2011-03-22 00:54:43 -04:00
Roberto Rosario
75dc4c84b3 Removed old code 2011-03-21 18:49:34 -04:00
Roberto Rosario
75324ce581 Disabled single OCR document action as multple actions are now enabled by default 2011-03-21 16:32:01 -04:00
Roberto Rosario
5d9302e583 Added multi ocr queued document delete support 2011-03-21 16:29:04 -04:00
Roberto Rosario
bef40d958e Added OCR multi document re-queue support 2011-03-21 16:19:19 -04:00
Roberto Rosario
bbcc0ead65 * Added a new option OCR_REPLICATION_DELAY to allow the storage some time for replication before attempting to do OCR to a document 2011-03-21 12:24:42 -04:00
Roberto Rosario
31d1641fa4 Added simple statistics page (total used storage, total docs, etc) 2011-03-20 04:35:21 -04:00
Roberto Rosario
fe2c031dfb Added missing alt attribute 2011-03-16 17:04:18 -04:00
Roberto Rosario
33089ccd08 Don't display an error for the thumbnail of non existant documents 2011-03-16 16:37:30 -04:00
Roberto Rosario
c9d82da28a Added indexing flags to ocr model 2011-03-16 04:57:59 -04:00
Roberto Rosario
9569992caf Removed debug code 2011-03-12 04:03:11 -04:00
Roberto Rosario
242c39690f Spanish translation updates 2011-03-11 14:36:14 -04:00
Roberto Rosario
0a91b7ff7d Don't allow duplicate documents in queues 2011-03-11 01:01:56 -04:00
Roberto Rosario
67c8f26d7f Renamed document queue state links 2011-03-10 00:02:04 -04:00
Roberto Rosario
cc6e8220c0 Changed ocr status display sidebar from from based to text based 2011-03-10 00:01:30 -04:00
Roberto Rosario
9bd22f65d1 Do not reinitialize document queue and/or queued document on reentry 2011-03-09 22:50:20 -04:00
Roberto Rosario
9bcd2d33ed Added debuging loging 2011-03-09 22:50:03 -04:00
Roberto Rosario
f1771158d6 Fixed OCR queue list showing wrong thumbnail 2011-03-09 12:59:16 -04:00
Roberto Rosario
739c2ee299 Converted modules to use the new simpler permission checking 2011-03-09 01:20:07 -04:00
Roberto Rosario
2eafc75d29 Revert ocr issue test 2011-03-08 01:05:32 -04:00
Roberto Rosario
b0700c5729 Try to fix issue #2 2011-03-07 23:40:49 -04:00
Roberto Rosario
bc4c3b6c75 Remove unused function 2011-03-07 23:40:35 -04:00
Roberto Rosario
e4912a8d4d Close file descriptors to prevent memory leaks 2011-03-07 23:22:53 -04:00
Roberto Rosario
efdd180483 Show document thumbnail in document ocr queue list 2011-03-07 19:24:27 -04:00
Roberto Rosario
86ed128dbe Make ocr document date submitted column non breakable 2011-03-07 19:22:00 -04:00
Roberto Rosario
118e3d2e4a Merge remote branch 'origin/master' 2011-03-07 18:20:37 -04:00
Roberto Rosario
5563e74e77 Fix permissions once more, directories to 755 and files to 644 2011-03-07 12:27:58 -04:00
Roberto Rosario
6a9e114acb Set all *.py files permissions to 644 2011-03-07 12:15:25 -04:00
Roberto Rosario
7eee9c44f4 * Added document queue property side bar window to the document queue list view 2011-03-06 02:35:42 -04:00
Roberto Rosario
d05295bf54 Added links, views and permissions to disable or enable an OCR queue 2011-03-06 00:47:16 -04:00
Roberto Rosario
661d38aa41 Spanish translation updates 2011-03-05 19:52:50 -04:00