TODO, WISHLIST =========== * Fix repeated search results - DONE * File renaming dropdown - DONE * Create indexing filesystem folders from document type metadata type - DONE * Document detail to view document metadata - DONE * Add file checksums (hashlib) - DONE * Delete symlinks when document is deleted - DONE * Handle NULL mimetypes during model save - DONE * Raise exception instead of returning error msg - DONE * Option to delete source staging file after upload - DONE * Jquery upload document upload form with ajax widget - NOT NEEDED (commit: b0f31f2a8f82ff0daca081005f2fcae3f5573df5) * Rename dropbox from document edit view - DONE * Ability to rename staging file during upload - DONE * Implement single sign on or LDAP for intranets - DEFERRED, provided by Django AuthBackends * Database storage backend (sql, nosql: [mongodb]) - DEFERRED, provided by https://bitbucket.org/david/django-storages/wiki/Home * Staging file previews - DONE * Display file size in list and details - DONE * Document previews - DONE * Document previews on demand w/ imagemagick - DONE * Add document description - DONE * Integrate with http://code.google.com/p/pytesser/ - DEFERRED, done using Popen * Show abbreviated uuid in document list - DEFERRED, Impractical * Update symlinks when document or metadata changed - DONE * Cache thumbnails and preview by document hash not by uuid - DONE * Show document metadata in document list - DONE * Add css grids - DONE * If theres only one document type on db skip step 1 of wizard - DONE * Be able to delete staging file - DONE * Group documents by metadata - DONE * Permissions - DONE * Roles - DONE * Assign default role to new users - DONE * DB stored transformations - DONE * Recognize multi-page documents - DONE * Add unpaper to pre OCR document cleanup - DONE * Count pages in a PDF file http://pybrary.net/pyPdf/ - NOT NEEDED * Support distributed OCR queues (RabbitMQ & Celery?) - DONE * MuliThreading deferred OCR - DONE * Handle ziped or rar archives - DONE (zip only) * Scheduled maintenance (cleanup, deferred OCR's) - DONE * Tesserat default option ocr setup - DONE * Check duplicated files using checksum - DONE * Link to delete and recreate all document links - DONE * Indicate in generic list which don't exist in storage backend - DONE * Change to model signals - NOT NEEDED, found way to prevent save method recursion * Show current page in generic list template - DONE * Enable/disable ocr queue view & links - DONE * Android App - NOT NEEDED (Use CamScanner, (C) IntSig Information Co, Ltd.) * Extract nagivation code into new navigation app - DONE * TODO: Solve css:last column problem * Change task manager from celery/rabbit to gearman * Receive documents via email * Mobile version * Exif to metadata convertion * External portal using sites contrib app * Scanned document embedded barcode reading * Encode metadata in barcode or qrcode on hardcopy * Support custom applets (coordinates) http://facstaff.unca.edu/mcmcclur/GoogleMaps/Projections/Lambert.html http://code.google.com/p/django-gmapi/ http://pypi.python.org/pypi/django-easy-maps Main ==== * Diagnostics (document file <-> document db mismatches, orphan files) - DONE * Sidebar search - DONE * Convert home and about views to use generic_template - DEFFERED, cleanup html Common ====== * Filterform date filtering widget * Divide navigation links search by object and by view * Merge generic templates into template widget object based rendering - DONE * Multiple document select in generic list template - DONE * Keyboard navigation * Default button linking to 'Enter' and ESC key for cancel * Dismiss all messages - DONE * Sidebar multi item action box - DONE (inline w/ list) * Show big icons in confirmation prompts (ie: document delete) - DONE Permissions =========== * Add permissions support to menus - DONE * Role editing view under setup - DONE * Implement permissions decorators * Add user editing under roles menus - DONE * Workflows app * Handle anonymous users Documents ========= * Skip step 2 of wizard (metadata) if no document type metadata types - DONE have been defined * Tile based image server * Do separate default transformations for staging and for local uploads * Download a document in different formats: (jpg, png, pdf) * Download metadata group documents as a single zip file * Download original document or transformed document * Display preferences 'document transformations' (Rotation, default zoom) * Document view temp transformations * Gallery view for document groups - DONE * Versioning support * Generic document anotations using layer overlays * Field for document language or autodetect * Validate GET data before saving file * Multiple document actions (clear transformations, delete) - DONE * Publish document option * Document list filtering by metadata * Show last 5 recent metadata setups for easy switch * Allow document type to be changed in document edit view * Document model's delete method might not get called when deleting in bulk from a queryset * Allow metadata entry form to mix required and non required metadata - DEFFERED * Block Setup menu item to non staff and non superuser users - DONE by means of evaluation or permissions * Include annotations in transformed documents downloads * Toggable option to include default transformation on document upload * Add document tagging - DONE * Separate free form document rename and require new permission * Test zip file upload with multi directories zip file * Don't append an extension separator if extension is non existant * Statistics page (total used storage, total docs, per metadata group, per type, per metadata) - DONE * Improve doc page template/view - DONE * Document page edit view - DONE * Show all document's pages content combined - DONE * Create 'simple view' document view for non technical users - DONE * Unify document form classes - DONE * Use document preview code for staging file also - DONE * Delete physical file on delete method - DEFFERED (Not needed until Django 1.3) * Download individual document pages * Document printing support - STARTED * Document group actions - DONE Filesystem serving ================== * Avoid metadata indexing folders name clash * WebDAV support Search ====== * Advanced search by metadata fields * Save advanced search by metadata setup as a virtual folder * Add show_summary method to model to display as results of a search * Cross model inclusion search - DONE * Separate view code from search code - DONE * New results only view - DONE * Date search support Convert ======= * Create mimetype convert map * Migrate ocr app tesseract handling to convert app - DONE * Add timeout support convert tasks * DXF viewer - http://code.google.com/p/dxf-reader/source/browse/#svn%2Ftrunk * Support spreadsheets, wordprocessing docs using openoffice in server mode - DONE * Cache.cleanup function to delete cached images when document hash changes * Support ExactImage * Optimize zoom resizing # if size: # extra_options += u' -resize %d' % (int(size) * zoom / 100) # else: # if zoom != 100: # extra_options += u' -resize %d%% ' % zoom Storage ======= * Storage backend to storage backend copy support, to move/migrate document to new storage backend * Encrypting storage backend Middleware storage that calls final storage backend, http://www.freenet.org.nz/ezPyCrypto/ GridFSStorage ============= * Implement user settings - DONE * Implement delete-open soft locking - DEFERRED * Implement master_slave_connection * if exists adding _ plus a counter - avoid file versioning * GridFS FUSE to filesystem serving bridge OCR === * Don't do OCR on wordproccessing or spreadsheet document, strip tags and store text * Add timeout support to ocr tasks * Allow for OCR document requeue on error and requeue limit * Multiple ocr queue support - STARTED * Add per node max ocr concurrent execution - DONE * Don't allow duplicate documents in queues - DONE * OCR queue schedule support * Make automatic OCR a function of OCR app and not of Documents app (via signals) - DONE * Two types of OCR nodes: thin, fat (thin = document file is passed serialized to node, fat = has direct access to document storage read document file) * Move document in queue (up, down, top, bottom) * Support ocropus * Support cuneiform * Implement StringIO * Add storage replication delay setting - DONE * Advanced statistics (OCR average completion time, error rate)