Changed from python's multiprocessing to celery to handle concurrency

2011-02-17 03:45:30 -04:00
parent 409a52af95
commit 478fb3502e
13 changed files with 102 additions and 87 deletions
--- a/docs/CREDITS
+++ b/docs/CREDITS
@@ -75,3 +75,17 @@ Fancybox - FancyBox is a tool for displaying images, html content and
 unpaper - post-processing scanned and photocopied book pages
    Jens Gulden 2005-2007 - unpaper@jensgulden.de.
    http://unpaper.berlios.de/
+    
+celery - Celery is an open source asynchronous task queue/job queue
+         based on distributed message passing. It is focused on real-time
+         operation, but supports scheduling as well.
+    Copyright 2009-2011, Ask Solem & contributors
+    http://ask.github.com/celery/getting-started/introduction.html
+
+django-celery - django-celery provides Celery integration for Django;
+                Using the Django ORM and cache backend for storing
+                results, autodiscovery of task modules for applications
+                listed in INSTALLED_APPS, and more.
+    Copyright Ask Solem & contributors
+    http://github.com/ask/django-celery/
+    
--- a/docs/Changelog.txt
+++ b/docs/Changelog.txt
@@ -12,3 +12,4 @@
 * Added views to create, edit and grant/revoke permissions to roles
 * Apply default transformations to document before OCR
 * Added unpaper to the OCR convertion pipe
+* Added support for concurrent, queued OCR processing using celery
--- a/docs/TODO
+++ b/docs/TODO
@@ -32,7 +32,11 @@
 * DB stored transformations                                            - DONE
 * Recognize multi-page documents                                       - DONE
 * Add unpaper to pre OCR document cleanup                              - DONE
+* Count pages in a PDF file http://pybrary.net/pyPdf/                  - NOT NEEDED
+* Support distributed OCR queues (RabbitMQ & Celery?)                  - DONE
+* MuliThreading deferred OCR                                           - DONE
 * Role editing view under setup                                        - STARTED
+* Scheduled maintenance (cleanup, deferred OCR's)                      - DONE
 * Document list filtering by metadata
 * Filterform date filtering widget
 * Validate GET data before saving file
@@ -46,20 +50,15 @@
    from a queryset
 * Allow metadata entry form to mix required and non required metadata
 * Link to delete and recreate all document links
-* MuliThreading deferred OCR
 * Versioning support
 * Generic document anotations using layer overlays
 * Workflows
-* Scheduled maintenance (cleanup, deferred OCR's)
 * Add tags to documents
 * Field for document language or autodetect
-* Count pages in a PDF file http://pybrary.net/pyPdf/
-* Download a document in diffent formats: (jpg, png, pdf)
 * Download a document in diffent formats: (jpg, png, pdf)
 * Cache.cleanup function to delete cached images when document hash changes
 * Divide navigation links search by object and by view
 * Add show_summary method to model to display as results of a search
-* Support distributed OCR queues (RabbitMQ & Celery?)
 * DXF viewer - http://code.google.com/p/dxf-reader/source/browse/#svn%2Ftrunk
 * Support spreadsheets, wordprocessing docs using openoffice in server mode
 * WebDAV support