User:DrTrigonBot/doc

Page has to be translated from de:User:DrTrigonBot/Doku. Further information related to techn. details are documented here.

Categorization
The goal/aim is to have a bot working automatic and processing page by page (for more detail confer bot flag request also), using clever algorithms (as mentioned on commons:User:DrTrigonBot/ToDo) to categorize media by machine (confer e.g. the journal "Computer Vision and Image Understanding").

First the bot uses various detection and recognition alogrithms on the image content to retrieve as much data as possible. In a second step the bot decides on the reliability of those data and uses it to categorize the image in a final step then. If successful, the category along with all data relevant for the categorization will be delivered to the image description page. The data is added using.

So the procedure reads: image download/retrieval → feature detection/extraction → classification → categorization → report

In order to do it's job the bot has to download every single image, thus we follow the principle of attempt to extract as many information as possible, once the file is downloaded.

User:Multichill is working on OpenCV face detection based classification too User:DrTrigonBot/doc.

Logging/debug results at: commons:User:DrTrigon/User:DrTrigonBot/logging - hist.

Properties
Very basic checks like: file size (os), pixel size (PIL, rsvg), palette, SVG validity (py_w3c), ...

Categories: Category:Animated GIF, Category:Animated PNG

Conditional Categories: Category:PDF files, Category:TIFF images (only those ones, see )

Examples:, ...

Faces
Pre-trained haar cascade detection (OpenCV) for frontal and profile faces along with eye detection (within the face region) in order to reach as sufficient quality.

The metadata (ExifTool, may be pyexiv2) of images takes with modern digital cameras may contain face detection data (detection done by camera software) also. Those data are extracted and processed too (like commons:User:DschwenBot does for GPS data).

For some further info on face detection works confer e.g. this. More about extraction of camera face detection info.

Categories: Category:Unidentified people, Category:Groups, Category:Faces, Category:Portraits

Examples:,  , ...

ColorAverage
The color histogram (PIL) is used to calculate the images average color RGB value. This is compared to a predfined color palette (Pantone color matching system) by calculating the Color difference (python-colormath) and finding the closest match in order.

Further info on color palettes can be seen at RGB Chart & Multi Tool. May be NCS would be more suitable (in general a palette with constant distances between all color should be preferred over Pantone)?

Categories: Category:Graphics

Examples:,  , ...

ColorRegions
First a image segmentation algorithm (JSEG project, may be SLIC) is applied, may be incrementally. Then the same as in User:DrTrigonBot/doc is done for every region/segment. Afterwards the position and size all the regions are calculated to complete the data.

This procedure is oriented on Automatic Categorization of Image Regions using Dominant Color based Vector Quantization, e.g to use JSEG and GLA. Read Uni-Modal Versus Joint Segmentation for Region-Based Image Fusion for more info.

Categories: (works not very well - thus switched off at the moment)

Examples: ...

People
To implement people/pedestrian detection, we use the pre-trained HOG descriptors (OpenCV) and complete them with haar cascade detection full body detection (similar as in User:DrTrigonBot/doc).

Categories: Category:Unidentified people, Category:Groups

Examples:,  , ...

Chessboard
Detection on chessboard pattern in any kind of scenes is a fundamental and crucial task for camera calibration and as such as separate algorithm dedicated for this purpose only was implemented (OpenCV) and can be used here as well.

Categories: Category:Chessboards

OpticalCodes
Automated detection of 1D and 2D optical codes (such as barcodes, data matrices, ...) is essential for a lot of applications and those algrithms (zbar, pydmtx) are used here also.

Categories: Category:Barcode

Examples:, ...

Text
(PDF only at current state)

Categories: Category:Books (literature) in PDF

Examples:, ...

Streams
(...)

Categories: Category:Videos, Category:Ogg sound files

Examples:, ...

(conditional)
All kinds of categories (e.g. file formats) not worth to be added alone and therefore need another ones already present (if one of the categories above was found).

Categories: Category:JPEG, Category:PNG, ...

Examples: ...

( switched off for the unspecific ones - only more special ones are handled now - more or less nothing ;)

(generic)
This is a section for new, experimental or other kind of methods not set up with a specialized template yet. This template can be set up by anybody. The absence of it indicates that something was going wrong and the bot fell back to this "emergency" mode in order to be able to do an output at least. It is used on the logging/debug page commons:User:DrTrigon/User:DrTrigonBot/logging also.

Examples: ...

Belonging to here / parts in development / experimental / not or partly implemented yet:
 * Error tolerance and recovery: make more error tolerant and try to recover the not corrupted image data for cases like e.g. File:Box Hill.png (may be try to repair the images?!?)
 * OpenCV: Bag of words model in computer vision
 * more categorization based on OpenCV BoW (Bag of Words) algorithm is planned for the future (may be see also and other papers)
 * what is bag of features? http://opencv.willowgarage.com/wiki/bagoffeatures
 * general feature extraction in scikit.learn, also BOW
 * example howto
 * PyWavelets: Fast Wavelet-Based Visual Classification in which wavlets are used as features in a very similar algorithm like BoW (machine learning - classifiers have to be trained) but with broader applications such as categorization of: image, audio, text, (video), peak detection/finding (scipy, scipy.signal.find_peaks_cwt), ...
 * text categorization
 * text analysis (e.g. Natural Language & Text Processing)
 * Ellogon
 * Natural Language Toolkit
 * Gensim
 * Experiments on the English Wikipedia
 * Training document collections
 * textmining (Examples), MontyLingua, Whoosh (Python search library with Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc. )
 * ocr - text recognition (confer http://www.archivista.ch/de/media/ocr2.pdf also)
 * Tesseract/OCROpus
 * OpenSource-Barcodeerkennung mit ExactImage (Teilfinanzierung freearchives.ch)
 * Linux-Port von Cuneiform erscheint
 * Mit hocr2pdf können durchsuchbare PDF-Dateien erstellt werden (Finanzierung durch Archivista GmbH)
 * layout analysis: e.g. ocropus, pdfminer, ...
 * PythonInMusic and video manipulation together with PyWavelets as well as one of the classifiers from Decision Tree, Support Vector Machines (SVM) in Python, scikit-learn: machine learning in Python, PyML - machine learning in Python and OpenCV may be even marsyas
 * audiotools and yaafe audio feature extractor [not implemented yet]
 * music21 (confer audio feature extraction) with midi support [implemented (party) but beta and not documented yet]
 * midi e.g. by using music21 or mingus together with LilyPond in order to create sheet music in PDF, PNG and postscript (also offers ASCII tablature and MusicXML exporting and a sound analysis module which can recognize notes and melody in raw audio data)
 * Video Genre Categorization Using Audio Wavelet Coefficients [not implemented yet]
 * Audio Classification and Categorization Based on Wavelets and Support Vector Machine [not implemented yet]
 * perhaps use another algorithm later too (description, title, globalusage & image captions, usage in the whole web, ...)
 * FFT and others for Physics-based Photograph and Computer Graphics Classification (contact Ng Tian Tsong; I2R and Francois Bremond; INRIA)
 * as number of gradients and colors is used for Graphics detection we could also use the number of frequencies (width of distribution) from FFT [implemented but beta and not documented yet]
 * Wavelet feature selection for image classification and wp_scalogram.py, swt2.py, image_blender.py
 * MPEG7: MPEG-7 Resources and MPEG-7 Feature Extraction Library
 * TREC Video Retrieval Evaluation: TRECVID
 * hashing of image, audio and video: pHash with (py-phash or pHash bindings) (and may be others too) to make recognition like in Looks Like It and Proof of Concept MTG Image Recognition. May be simpler to use ctypes with pHash docs.
 * for audio files there are already databases available (e.g. the open source MusicBrainz), which can be used by generating AcoustID Fingerprints from pyacoustid and comparing.
 * "SURF: Speeded Up Robust Features" (ETHZ!) is a performant scale- and rotation-invariant interest point detector and descriptor. Use it e.g. to mark the position of an image crop within the original one, e.g. File:Baseball (crop).jpg within File:Baseball.jpg.
 * support for missing file formats: done
 * camera pose estimation (see, , and may be  also) code in C++ available, now port it to python and use e.g. on faces, chessboards and other detected objects... [implemented but beta and not documented yet]
 * pose estimation can be extended to faces (a lot more interesting) by using flandmark (xbob.flandmark) points from faces. See The Face Detection Homepage also. [implemented but beta and not documented yet]
 * may be use POSIT in addition to solvePnP as e.g. in . The advantage is it does not need a camera calibration, but with drawback of not having python bindings yet. [implemented but beta and not documented yet]
 * for very high resolution images we could try to go a step further and apply OpenCV knows where you’re looking with eye tracking (hack-a-day) (pupil form - rotation) in order to get a eye/gaze direction estimation/tracking, confer also how-to-perform-stable-eye-corner-detection (pupil position relative to eye - translation). May be both methods can be combined by using solvePnP or POSIT?! Another example is opengazer.
 * plate detection and recognition (ANPR), e.g. plategatewayqt or licenseplate as starting points
 * (may be) replace JSEG alogrithm with scikit.learn Spectral clustering according to example how to segment the picture of Lena in regions (or with circles), confer e.g. scikit-learn Clustering, scikit-image segmentation (more info), ward-segmentation, ...

Libraries and external code (credits)
Before categorizing the bot tries to gather as much information about an image file and its content as possible by means of the following libraries and methods:


 * python default packages (e.g. PIL)
 * pywikipedia framework packages
 * additional python packages (more exotic ones)
 * NumPy
 * SciPy (ndimage)
 * OpenCV (v1 and v2 bindings/wrapper)
 * pyexiv2
 * RSVG with GTK+ and Cairo
 * libmagic
 * music21
 * modules needing compilation (C/C++ code)
 * JSEG algorithm from University of California (with kind thanks for the permission to use it) refined into a python wrapper/bindings
 * pydmtx libdmtx Python Wrapper (need to compile because of missing debian/TS package)
 * zbar Python Wrapper (need to compile because of missing fedora/devel environment package)
 * OpenCV Object Categorization by BoW refined into a python wrapper/bindings because not included in official ones
 * SLIC Superpixels for Python Wrapper (need to compile because of missing package - is in early development)
 * DrTrigonBot framework packages
 * pycolorname
 * simple third-party modules without package
 * python-colormath
 * py_w3c on the recommendation of the W3C
 * ( PDFMiner )
 * external programs (binaries)
 * ExifTool by Phil Harvey (since it is the only one capable of handling face recognition meta data)
 * pdftotext from poppler library
 * ffprobe from FFmpeg library
 * ImageMagick

Machine learning

 * classification
 * OpenCV, python machine learing packages
 * orange; Getting Started With Orange
 * cascade classification
 * use: opencv_traincascade to train classifier (available in fedora and ubuntu)
 * Object Categorization
 * Normal Bayes classifier: Using the Normal Bayes classifier for image categorization in OpenCV
 * Bag of Words model: The Bag of Words model in OpenCV 2.2 (result can be visualized)
 * bagofwords_classification.cpp can be imported as python module with help of Boost.Python
 * Sample dataset for training
 * Caltech-256 Object Category Dataset: http://www.vision.caltech.edu/Image_Datasets/Caltech256/
 * PASCAL Visual Object Classes: http://pascallin.ecs.soton.ac.uk/challenges/VOC/

I installed OpenCV from linux distro repos:
 * ubuntu or fedora have OpenCV python bindings
 * in the samples directory are some folders with example python, C and C++ programs (fun and useful to play around with!)

Do face detection in combination with Pywikipedia to fill Category:Unidentified people (may be Category:Unidentified people (bot tagged)?). Next step is probably to start training some filters based on Commons images. For more details on test done e.g. on fedora 15 with face detection and 'bag of words' method, confer the code for pywikipedia bot framework available at https://jira.toolserver.org/browse/DRTRIGON-120. Most recent code available at:
 * https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/catimages.py?hb=true
 * https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/opencv

From User:Multichill/Using OpenCV to categorize files
At the time of writing Commons contains about 150.000 uncategorized files. This is only about 1,25% of all files, but it's always nice to be able to lower the number even further. A lot of categorization work has already been done by the CategorizationBot, but this work is all done based on usage of a file. No categorization has been done based on the contents of the file itself.

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision. It can be used to "recognize" images. OpenCV could be used to move uncategorized files to one of the unidentified topics categories based on the image characteristics. OpenCV contains several approaches we could use to "recognize" images.

Some frequently occurring subjects in uncategorized files:
 * ? Maps, could go to Category:Unidentified maps
 * ? Flags, could go to Category:Unidentified flags
 * ? Plants, could go to Category:Unidentified plants
 * ? Coats of arms, could go to Category:Unidentified coats of arms
 * ? Buildings, could go to Category:Unidentified buildings
 * ? Trains, could go to Category:Unidentified trains
 * ? Automobiles, could go to Category:Unidentified automobiles (Vehicle Detection using Haar Cascades: car3_xml.zip)
 * ? Buses, could go to Category:Unidentified buses
 * ? Category:Diagrams
 * ? (Category:Colors by name?)
 * Animations to Category:Animated SVG missing

MailerBot
de:User:DrTrigonBot/Doku