151 lines
5.8 KiB
HTML
151 lines
5.8 KiB
HTML
<!DOCTYPE html>
|
|
<html>
|
|
<head>
|
|
<meta charset='utf-8'>
|
|
|
|
<title>rosarior/mayan @ GitHub</title>
|
|
|
|
<style type="text/css">
|
|
body {
|
|
margin-top: 1.0em;
|
|
background-color: #002134;
|
|
font-family: Helvetica, Arial, FreeSans, san-serif;
|
|
color: #ffffff;
|
|
}
|
|
#container {
|
|
margin: 0 auto;
|
|
width: 700px;
|
|
}
|
|
h1 { font-size: 3.8em; color: #ffdecb; margin-bottom: 3px; }
|
|
h1 .small { font-size: 0.4em; }
|
|
h1 a { text-decoration: none }
|
|
h2 { font-size: 1.5em; color: #ffdecb; }
|
|
h3 { text-align: center; color: #ffdecb; }
|
|
a { color: #ffdecb; }
|
|
.description { font-size: 1.2em; margin-bottom: 30px; margin-top: 30px; font-style: italic;}
|
|
.download { float: right; }
|
|
pre { background: #000; color: #fff; padding: 15px;}
|
|
hr { border: 0; width: 80%; border-bottom: 1px solid #aaa}
|
|
.footer { text-align:center; padding-top:30px; font-style: italic; }
|
|
</style>
|
|
</head>
|
|
|
|
<body>
|
|
<a href="http://github.com/rosarior/mayan"><img style="position: absolute; top: 0; right: 0; border: 0;" src="http://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub" /></a>
|
|
|
|
<div id="container">
|
|
|
|
<div class="download">
|
|
<a href="http://github.com/rosarior/mayan/zipball/master">
|
|
<img border="0" width="90" src="http://github.com/images/modules/download/zip.png"></a>
|
|
<a href="http://github.com/rosarior/mayan/tarball/master">
|
|
<img border="0" width="90" src="http://github.com/images/modules/download/tar.png"></a>
|
|
</div>
|
|
|
|
<h1><a href="http://github.com/rosarior/mayan">Mayan</a>
|
|
<span class="small">by <a href="http://github.com/rosarior">Roberto Rosario</a></span></h1>
|
|
|
|
<div class="description">
|
|
Open source, Django based document manager with custom meta-data indexing, file serving integration and OCR capabilities
|
|
</div>
|
|
|
|
<p>Bulk upload documents directly or by using a staging folder to receive scanned documents. Organize using document classes and custom meta-data as well as automatic document grouping. Find document by means of full text searching, either meta-data, document properties, content extracted from PDFs or transcribed by OCR.</p>
|
|
<h2>Features</h2>
|
|
<p>
|
|
<ul>
|
|
<li>User defined metadata fields</li>
|
|
<li>Dynamic default values for metadata</li>
|
|
<li>Lookup support for metadata</li>
|
|
<li>Filesystem integration by means of metadata indexing directories</li>
|
|
<li>User defined document uuid generation</li>
|
|
<li>Local file or server side staging file uploads</li>
|
|
<li>Batch upload many documents with the same metadata</li>
|
|
<li>User defined document checksum algorithm</li>
|
|
<li>Previews for a great deal of image formats, including PDF</li>
|
|
<li>Search documents by any field value</li>
|
|
<li>Group documents by metadata automatically</li>
|
|
<li>Permissions and roles support</li>
|
|
<li>Multi page document support</li>
|
|
<li>Page transformations</li>
|
|
<li>Distributed OCR processing</li>
|
|
<li>Multilingual user interface (English, Spanish, and easily expanded to others)</li>
|
|
<li>Multilingual OCR support: English, French, Italian, German, Spanish and others (as supported by Tesseract)</li>
|
|
<li>Duplicated document search</li>
|
|
<li>Upload multiple documents inside a ZIP file</li>
|
|
<li>Plugable storage backends (File based and GridFS included)</li>
|
|
</ul>
|
|
</p>
|
|
<h2>Screenshots</h2>
|
|
<p>
|
|
<img src="images/pages-carousel.png" width="800"/>
|
|
Document's page previews
|
|
</p>
|
|
<p>
|
|
<img src="images/settings.png" width="800"/>
|
|
Many configuration option with sensible defaults
|
|
</p>
|
|
<p>
|
|
<img src="images/grouping.png" width="800"/>
|
|
Automatic document grouping
|
|
</p>
|
|
|
|
<h2>Dependencies</h2>
|
|
<p>
|
|
<ul>
|
|
<li>Django - A high-level Python Web framework that encourages rapid development and clean, pragmatic design.</li>
|
|
<li>django-pagination</li>
|
|
<li>django-filetransfers - File upload/download abstraction</li>
|
|
<li>celery- asynchronous task queue/job queue based on distributed message passing</li>
|
|
<li>django-celery - celery Django integration</li>
|
|
<li>libmagic - MIME detection library</li>
|
|
<li>tesseract-ocr - An OCR Engine that was developed at HP Labs between 1985 and 1995... and now at Google.</li>
|
|
<li>unpaper - post-processing scanned and photocopied book pages</li>
|
|
<li>ImageMagick - Convert, Edit, Or Compose Bitmap Images</li>
|
|
<li>GraphicMagick - Robust collection of tools and libraries to read, write, and manipulate an image.</li>
|
|
<li>popper-utils' pdftotext</li>
|
|
</ul></p>
|
|
<h2>Installation</h2>
|
|
<pre>
|
|
virtualenv --no-site-packages mayan
|
|
cd mayan
|
|
git clone git://github.com/rosarior/mayan.git
|
|
cd mayan
|
|
source ../bin/activate
|
|
pip install -r requirements/production.txt</pre>
|
|
<h2>License</h2>
|
|
<p>Licensed under the GPL Version 3</p>
|
|
<h2>Authors</h2>
|
|
<p>Roberto Rosario
|
|
<br/> </p>
|
|
<h2>Contact</h2>
|
|
<p>Roberto Rosario (roberto.rosario.gonzalez@gmail.com)
|
|
<br/>http://twitter.com/#siloraptor</p>
|
|
|
|
|
|
<h2>Download</h2>
|
|
<p>
|
|
You can download this project in either
|
|
<a href="http://github.com/rosarior/mayan/zipball/master">zip</a> or
|
|
<a href="http://github.com/rosarior/mayan/tarball/master">tar</a> formats.
|
|
</p>
|
|
<p>You can also clone the project with <a href="http://git-scm.com">Git</a>
|
|
by running:
|
|
<pre>$ git clone git://github.com/rosarior/mayan</pre>
|
|
</p>
|
|
|
|
<div class="footer">
|
|
get the source code on GitHub : <a href="http://github.com/rosarior/mayan">rosarior/mayan</a>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<script type="text/javascript">
|
|
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
|
|
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
|
|
</script>
|
|
<script type="text/javascript">
|
|
try {
|
|
var pageTracker = _gat._getTracker("UA-22801354-1");
|
|
pageTracker._trackPageview();
|
|
} catch(err) {}</script>
|
|
</body>
|
|
</html>
|