mayan-edms

Author	SHA1	Message	Date
Roberto Rosario	deb09d3d83	Re enabled tesseract language specific OCR processing and added a 1 time language neutral retry for failed language specific OCR	2011-11-22 17:46:18 -04:00
Roberto Rosario	f0c019f6fc	Reduce severity of the messages displayed when no OCR backend is found for a language	2011-11-06 01:06:43 -04:00
Roberto Rosario	bcb61c3ca3	Enabled OCR queue transformation processing	2011-07-25 03:40:15 -04:00
Roberto Rosario	90e876ca93	Code cleanup	2011-07-21 11:46:15 -04:00
Roberto Rosario	89fc258a59	Adapter the OCR app to the new pre cache and preview generation methods	2011-07-21 03:49:27 -04:00
Roberto Rosario	8579c5081d	Improved OCR file conversion	2011-07-19 20:56:21 -04:00
Roberto Rosario	648be556a6	Finished adapting the OCR app to the new transformations refactor	2011-07-19 04:21:36 -04:00
Roberto Rosario	5bfd607b31	Removed pdftotext from the requirements, move unpaper calling to the OCR app	2011-07-18 04:06:19 -04:00
Roberto Rosario	5829bbde4d	Added per OCR queue transformation models and CRUD views to replace the CONVERTER_OCR_OPTIONS with the new refactored converter transformations systems	2011-07-17 01:32:46 -04:00
Roberto Rosario	0a5dfd6aa9	Plug file descriptor leak	2011-05-19 22:55:57 -04:00
Roberto Rosario	07e9b12e78	flake8 cleanups, ununsed imports and variables cleanup, changed register_diagnostics to use reverse_lazy instead of reverse	2011-05-06 10:39:54 -04:00
Roberto Rosario	ae35e89549	Unicode updates	2011-05-03 21:11:35 -04:00
Roberto Rosario	1e0d8d1f25	Added doctring description	2011-05-03 20:58:58 -04:00
Roberto Rosario	2a744cefea	PEP8, pylint cleanups and removal of relative imports	2011-04-23 02:49:07 -04:00
Roberto Rosario	eaaaa5b645	Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR	2011-04-15 23:59:52 -04:00
Roberto Rosario	6b67cff5d7	Changed the way document page count is parsed from the graphics backend, fixing issue #7	2011-04-08 03:29:48 -04:00
Roberto Rosario	71a3c218f4	PEP8, pylint and django-lint cleanups	2011-04-08 02:09:39 -04:00
Roberto Rosario	d54fd98ec5	Finished adding language specific ocr cleanup code	2011-04-07 12:23:26 -04:00
Roberto Rosario	d1ff305a3f	Initial commit for the ocr_cleanup branch	2011-04-07 04:07:59 -04:00
Roberto Rosario	f66c8ec6e2	Fixed error and some warning returned by pylint	2011-04-05 00:04:11 -04:00
Roberto Rosario	e4912a8d4d	Close file descriptors to prevent memory leaks	2011-03-07 23:22:53 -04:00
Roberto Rosario	6a9e114acb	Set all *.py files permissions to 644	2011-03-07 12:15:25 -04:00
Roberto Rosario	d0bea8ffeb	Initial version of the GridFS storage driver	2011-03-04 01:08:20 -04:00
Roberto Rosario	c18cb099c6	Improved tesseract execution handling	2011-02-17 23:31:54 -04:00
Roberto Rosario	77b8a432a2	Added distributed OCR queue support	2011-02-17 04:37:35 -04:00
Roberto Rosario	478fb3502e	Changed from python's multiprocessing to celery to handle concurrency	2011-02-17 03:45:30 -04:00
Roberto Rosario	409a52af95	First commit to support ocr subprocess	2011-02-17 01:57:14 -04:00
Roberto Rosario	dfd101c33b	Cleanup file after ocr	2011-02-16 20:54:11 -04:00
Roberto Rosario	b1e2f64617	Apply transformation before doing OCR, added unpaper to the OCR pre processing pipe	2011-02-16 03:32:21 -04:00
Roberto Rosario	fbc8bc960a	Decoupled page transformation interface, added default transformation support	2011-02-14 02:11:39 -04:00
Roberto Rosario	06d7e5a46a	Added multipage document support and document page transformation	2011-02-14 00:18:16 -04:00
Roberto Rosario	d6afcc64bb	Changed file permissions	2011-02-09 13:55:01 -04:00
Roberto Rosario	6569faad11	Added OCR capabilites	2011-02-09 02:12:14 -04:00

33 Commits