Open source, Django based document manager with custom meta-data indexing, file serving integration and OCR capabilities
Bulk upload documents directly or by using a staging folder to receive scanned documents. Organize using document classes and custom meta-data as well as automatic document grouping. Find document by means of full text searching, either meta-data, document properties, content extracted from PDFs or transcribed by OCR.
Features
- User defined metadata fields
- Dynamic default values for metadata
- Lookup support for metadata
- Filesystem integration by means of metadata indexing directories
- User defined document uuid generation
- Local file or server side staging file uploads
- Batch upload many documents with the same metadata
- User defined document checksum algorithm
- Previews for a great deal of image formats, including PDF
- Search documents by any field value
- Group documents by metadata automatically
- Permissions and roles support
- Multi page document support
- Page transformations
- Distributed OCR processing
- Multilingual user interface (English, Spanish, and easily expanded to others)
- Multilingual OCR support: English, French, Italian, German, Spanish and others (as supported by Tesseract)
- Duplicated document search
- Upload multiple documents inside a ZIP file
- Plugable storage backends (File based and GridFS included)
Screenshots
Document's page previews
Many configuration option with sensible defaults
Automatic document grouping
Dependencies
- Django - A high-level Python Web framework that encourages rapid development and clean, pragmatic design.
- django-pagination
- django-filetransfers - File upload/download abstraction
- celery- asynchronous task queue/job queue based on distributed message passing
- django-celery - celery Django integration
- libmagic - MIME detection library
- tesseract-ocr - An OCR Engine that was developed at HP Labs between 1985 and 1995... and now at Google.
- unpaper - post-processing scanned and photocopied book pages
- ImageMagick - Convert, Edit, Or Compose Bitmap Images
- GraphicMagick - Robust collection of tools and libraries to read, write, and manipulate an image.
- popper-utils' pdftotext
Installation
virtualenv --no-site-packages mayan
cd mayan
git clone git://github.com/rosarior/mayan.git
cd mayan
source ../bin/activate
pip install -r requirements/production.txt
License
Licensed under the GPL Version 3
Authors
Roberto Rosario
Contact
Roberto Rosario (roberto.rosario.gonzalez@gmail.com)
http://twitter.com/#siloraptor
Download
You can download this project in either
zip or
tar formats.
You can also clone the project with Git
by running:
$ git clone git://github.com/rosarior/mayan