User:Bogey4/Sandbox

=Creating the data set= This article demonstrates how to create a new data set of images and use them to train, test and tune an HTM network using NuPic. To create the architecture data set, numerous elevation drawings were downloaded from the Library of Congress’ Historic American Building Survey (http://memory.loc.gov/ammem/collections/habs_haer/). If you want to create your own dataset, there are many possible types of images that can easily be used instead of the architecture dataset, such as cartographic symbols, road signs, handwritten digits/letters, and blueprints.

After downloading the images and opening them in an image editing program, select and copy each architectural element (e.g. a window or door), and paste them into a new image. Resize the image to 128x128 and “touch up” any lines that may have been lost due to the significant resizing. Some of the training images from the object category ‘door’ are shown here:

Once you have about 20 or so images for each category, organize the images into directories so that all images of each type of element is in its own directory. Next, divide the data set into two subsets: training and testing, such that the testing set contains approximately 10-15% of the total number of images. Both the training and testing sets should be identically organized, with each directory having subdirectories identifying the class of each architectural element. Finally, place the training and testing directories inside $HOME/share/vision/data/architecture.

The data set used for this demonstration can be downloaded here (LINK!) under the Creative Commons Attribution-ShareAlike 3.0 license.

The file structure looks like this:
 * training/chimney
 * column
 * door
 * dormer
 * roundWindow
 * window
 * testing/chimney
 * column
 * door
 * dormer
 * roundWindow
 * window

The following table shows the number of training and testing images for the architecture demo.

Next, we’ll also generate some distorted versions of testing data by adding lines, occlusions, scaling, translation, and noise to the images. To generate the distorted data, open the file makeDistortedSet.py in a text editor, and go to line 394. Uncomment the lines for each type of distortion that you want to test the system for: distortions = [ scale, #blur, #brightness, noise, #lineOrOcclusion, occlusion, lines, #rotation2D, translation, ]

Once the makeDistortedSet.py file has been modified and saved, the distorted sets can be created by entering the following command into the Terminal/Command Prompt: python MakeDistortedSet.py data/architectureDemo/testing –d1

The parameter, “data/architectureDemo/testing” tells the python file where to find the image data to distort, and the “-d1” parameter specifies the level of difficulty, where 0 is easiest and 2 is hardest.

The output of this operation should look like this: Creating an ImageSensor... done. Loading images... done. Processing distortion "scale"... Processing distortion "noise"... Processing distortion "occlusion"... Processing distortion "lines"... Processing distortion "translation"... The distorted testing set will be created and saved in the ‘$HOME/share/vision/data/architecture/testing_distorted’ directory.

If you want to distort each testing image multiple times, use the ‘-n’ parameter to specify the number of distorted images to create for each original image (e.g. adding –n25 will create 25 distorted images for each original image). This is generally a good idea if you intend to report the results, since if the network performs very well on, for example, an occlusion test, it is possible that the objects were not very occluded, and thus the test was easy. However, by averaging the performance across 25 or 50 versions of each image, then the results are more likely to accurately represent the network’s performance.

Now that the data sets have been created, we will create a blank network.

=Creating a blank network= To create a blank network, enter the command: python NewExperiment.py architectureDemo/full pictures/full This command will create a folder ‘share/vision/experiments/architectureDemo/full’ with a file, params.py. The params.py file specifies how the network will be created, including all of the topology and node parameters.

Your new Images experiment has been created in: experiments/architectureDemo/full It has a copy of params.py from: experiments/pictures/full Edit params.py in the new directory, and then run: python RunExperiment.py architectureDemo/full

=Setting network parameters initially=

Network parameters
After opening the file ‘params.py’, you should see the network parameters at the top of the file, which are used to specify the network topology and the basic parameter values for each node in the hierarchy. The architecture images that are used for this experiment were created as 128x128 images, although for this example, we are going to resize them to 64x64 so that the training and testing can proceed more quickly. To do this, edit the first block of code for the ImageSensor (labeled “#Sensor’) so that the width and height parameters are set to 64 and add a filter to resize the images.

We’ll later be testing the network for its robustness to recognizing images under rotation and scaling, and when these transformations are applied to black and white (binary) images, the output often has lines that ‘zig-zag’ or bear little resemblance to the original image. To mitigate this effect, we also set the ‘mode’ parameter to ‘gray’ so that the images are recreated as grayscale images, which will reduce the aliasing effect. Also, since the images are on a white background, we set the ‘background’ parameter to 255 and the ‘invertOutput’ parameter to False.

The code block should look like this:

'width': 64, # Image width in pixels 'height': 64, # Image height in pixels 'mode': 'gray', # 'gray' for grayscale or 'bw' for black and white 'background': 255, # Value of 'background' pixels 'invertOutput': False # Inverts the output of the node (used by Pictures) 'filters': 'Resize', {'size': [64,64]},

Level 1 Parameters
The next block defines the Level 1 spatial pooler nodes, and is initially set so that each node sees a 4x4 image patch. Recall that the Pictures experiment was designed to operate on 32x32 images, such that the parameter 'size': [8, 8] means that the 8x8 array of nodes are spread evenly across the 32x32 image, with each seeing 1/8th of the image. Since the architecture images have been resized to 64x64, and we still want each node to see a 4x4 patch of the image, change the ‘size’ parameter to [16,16].

The parameter, 'overlap': [0, 0], which specifies the number of pixels that each node overlaps with its neighbour, will be left unchanged, and 'clonedNodes' will also keep its default value of True, as it allows a single node to be trained and cloned throughout the rest of the level. The parameter 'outputElementCount' will be increased slightly to 250, since the network will be working with larger images than in the Pictures experiment.

The next parameter to change is 'maxDistance', which specifies the maximum amount by which two coincidences can differ for them to be classified as being the same. This parameter can be difficult to optimize, but with a few simple calculations, we can find a good starting point, and later tune the network. As an example, imagine that a node is presented with a 4x4 pattern of a single black line in the left most column (i.e. it has values of 255 (black) in the leftmost column). If that node were next presented a similar pattern, except where the line has been shifted up by one pixel (i.e. there are three black pixels, but now the bottom leftmost corner is white), then the difference between these two patterns would be Number of different  pixel * the intensity difference = 1* 256 = 256. As a starting value, we’ll try setting maxDistance to a value slightly less than a difference of two pixels (i.e. 512) between coincidences. Similarly, 'sigma' can be a challenging parameter to fine tune, but we’ll start off by setting it to the same value as maxDistance.

{ # Level 1S / 1 'nodeType': 'SpatialPoolerNode', 'size': [16, 16], 'overlap': [0, 0], 'outputElementCount': 250, 'clonedNodes': True, 'maxDistance': 500, 'sigma': 500, },

The next block specifies the Level 1 Temporal pooler, and we’ll start out by setting its size, overlap, outputElementCount to the same values as the Level 1 Spatial pooler ([16,16], [0,0], and 250 respectively). We’ll also set the requestedGroupCount to the same value of outputElementCount in Level 1, and leave ‘equalizeGroupSize’ as False. The nodes will again be cloned throughout the level. The block should now look like this:

{ # Level 1T / 2 'nodeType': 'TemporalPoolerNode', 'size': [16, 16], 'overlap': [0, 0], 'outputElementCount': 250, 'clonedNodes': True, 'requestedGroupCount': 250, 'equalizeGroupSize': False, 'transitionMemory': 4 }

Level 2 Parameters
The next block specifies the Level 2 spatial pooler. At Level 1, each spatial pooler node sees a 4x4 patch of the image, and we would like to see a similar ratio at Level 2, such that the input to Level 2 is a 4x4 patch of outputs from the Level 1 temporal pooler. The initial settings in params.py specify the size of the Level 2 spatial pooler as 4x4, so we’ll keep this value, along with no overlap between nodes. Since each Level 2 spatial pooler node sees an area that is four times as large as the corresponding node in level one, we’ll initially set the outputElementCount to four times the value in Level 1 (250*4 = 1000).

The inputs to the Level 2 spatial pooler (i.e. the outputs of the Level 1 temporal pooler) are in the range of [0,1], whereas the inputs to the Level 1 spatial pooler were in the range [0,255], which will require the Level 2 spatial pooler’s maxDistance parameter to be much smaller. We’ll start by using a value of 0.01 and keep sigma at 1.5.

{ # Level 2S / 3 'nodeType': 'SpatialPoolerNode', 'size': [4, 4], 'overlap': [0, 0], 'outputElementCount': 1000, 'clonedNodes': True, 'maxDistance': 0.01, 'sigma': 1.5, 'sparsify': True },

For the Level 2 temporal pooler, most of the parameter will keep at their default values initially, although we’ll set ‘equalizeGroupSize’ to True, so that the temporal pooler node will attempt to form temporal groups that are roughly equal in size.

{ # Level 2T / 4 'nodeType': 'TemporalPoolerNode', 'size': [4, 4], 'overlap': [0, 0], 'outputElementCount': 300, 'clonedNodes': True, 'requestedGroupCount': 300, 'equalizeGroupSize': True, 'transitionMemory': 12 },

Classifier Parameters
The next block specifies the top most node in the network: the classifier. With NuPic 1.6, several classifiers are included: Zeta1TopNode (very fast, but less accurate), Support Vector Machine (SVM - slowest, but often performs well), and k-nearest neighbours classifier (knn, which is often between Zeta1TopNode and SVM in terms of speed and performance). With some datasets, using the SVM classifier can provide excellent recognition accuracy, although we need to be certain that the network is providing most of the recognition accuracy, and not just the classifier. Because of this, we’ll initially use the default classifier (Zeta1TopNode) to ensure that the network itself is performing well, and then later swap in the other classifiers and test their performance. The default classifier only requires that we specify the number of classes in the data (there are 6 types of architectural elements), so the classifier block should look like this:

{ # Classifier / 5 'outputElementCount': 6, # Maximum number of categories }

Training Parameters
The next section of the params.py file specifies the parameters to train the network from untrained.xml.gz. The parameters are stored as a dictionary for each level of nodes in the network, except for the sensor and effector. Specifying how the Level 1 spatial pooler is trained is performed first. For the lowest level in the HTM, we need to specify where to find the data, which for this example is stored in the ‘/data/architecture/training’ directory.

Since the spatial pooler does not learn any temporal data, it is not required to present the images to the network with any sort of temporal coherence. For this level, we will use the “randomFlash” explorer, which randomly flashes images to the network by switching position and image frequently. By using this explorer instead of more sophisticated algorithm such as multiSweep or exhaustiveSweep, we can save a lot of time during training, but still expose the network to a wide variety of possible coincidences. The third line of code informs the explorer to continue training at that level until the number of coincidences that it has found is equal to the ‘outputElementCount’ (i.e. 250). It also includes a print statement to display how many coincidences have been found as training progresses. Note however, that by using this line, the GUI cannot be used. If you want to use the GUI, we can find the approximate number of iterations required to get 250 coincidences by observing the output during training, and then substituting this value into params.py as ‘numIterations’: 600. { # Level 1S / 1 'data': ['loadMultipleImages', {'imagePath': 'architecture/training'}], 'explorer': "RandomFlash", 'target': "finished = master.coincidenceCount.values[0]" \ + "== master.maxCoincidenceCount.values[0]; \                     print  master.coincidenceCount.values[0]" }

For the Level 1 temporal pooler, we need to use an explorer that incorporates the temporal continuity that is found when sweeping across an image. Here, we will use the “MultiSweep” explorer instead of “exhaustiveSweep,” since it allows the number of iterations to be specified and thus can be much faster. The MultiSweep parameter ‘shift’ specifies that the ImageSensor will translate/shift the image by one pixel while training this level. The ‘minSweepLength’ parameter specifies the minimum distance that the ImageSensor will be moved, and since we want to expose each node to a variety of possible coincidences, we’ll set it to a value of 16 so that it can move around a fair bit at the bottom level. Also, we’ll initially set ‘numIterations’ to 5000, and this value can be increased or decreased later if needed.

{ # Level 1T / 2 'explorer': ["MultiSweep", { 'dimensions': [{ 'name': 'translation', 'shift': 1, }],                  'sweepOffObject': False, 'minSweepLength': 16 }],    'numIterations': 5000, },

As with the Level 1 spatial pooler, the Level 2 spatial pooler does not need to see the data in any specific temporal ordering, so we will again use the “randomFlash” explorer, as well as the target parameter:

{ # Level 2S / 3 'explorer': "RandomFlash", 'target': "finished = master.coincidenceCount.values[0]" \ + "== master.maxCoincidenceCount.values[0]; \                     print  master.coincidenceCount.values[0]" } The parameters of the Level 2 temporal pooler are similar to those at Level 1, although since the Level 2 nodes will operate on a larger portion of the image, we will increase the ‘minSweepLength’ and ‘shift’ parameters so that the nodes are still exposed to a good portion of the image: { 'explorer': ["MultiSweep", { 'dimensions': [{ 'name': 'translation', 'shift': 4, }],                  'sweepOffObject': False, 'minSweepLength': 24 }],    'numIterations': 5000, },

Finally, we’ll set the classifier node to use a “Flash” explorer. { # Classifier / 5 'explorer': 'Flash', }

Testing Parameters
This section of params.py is straightforward, in that it specifies the test name and the location of the testing data for each test. To ensure the network is working well, several tests will need to be performed.

First, we will test the trained network for its ability to correctly classify the training images. If it cannot accurately recognize the same images as it was trained on (i.e. recognition accuracy is not close to 100%), then we know that something is wrong with the parameter configuration.

{  'data': ['loadMultipleImages', {'imagePath': 'architectureDemo/training'}], 'name': 'trainingDataset' },

Second, we will test the network’s generalization ability by having it recognize images that it has never seen before (i.e. the testing set). The goal of this article is to maximize the network’s recognition accuracy on the testing set.

{  'data': ['loadMultipleImages', {'imagePath': 'architectureDemo/testing'}], 'name': 'testingDataset' },

Third, we will test the network’s ability to recognize the testing images when they have been distorted in various ways (i.e. the testing_distorted data set). {  'data': ['loadMultipleImages', {'imagePath': 'architectureDemo/testing_distorted/lines'}], 'name': 'linesTest' },

Initially, we wll only use the images that have had lines added to them, although other tests can be conducted by changing the image path to point to the /architectureDemo/testing_distorted/scale, /noise, /occlusion, and /translation directories.

=Tuning the parameters= Before starting to think about how to change parameter values, we first need to ensure that there are no syntax errors and that the network is correctly processing the images. To verify this, run the experiment with the –d flag to show debugging information: python RunExperiment.py architectureDemo/full -d

When using the –d debug flag, several additional directories and files will be written to the 'experiments/architectureDemo/full/sessions' directory. Open up the most recent folder in the sessions directory (e.g. train.1.local_bundle), and then open the '/imagesensor_log' directory. When the –d flag is used, NuPic will create a subdirectory here called '/output_from_filters' which contains the resized images, and '/output_to_network', which contains the images that were submitted to the ImageSensor. The images in the latter directory should be slightly translated copies of the resized images. There will also be a file in the directory named ‘imagesensor_log.txt’ which describes the order in which the images and image categories were presented to the network, as well as the amounts by which they were translated. ('compute', {}, {'iteration': 2, 'position': {'reset': False, 'image': 108, 'filters': [0], 'offset': [0, 0]}, 'filename': '018.png', 'categoryIndex': 5, 'categoryName': 'window',  'blank': False}) This shows that on iteration 2, image 108 from category 5 (window) was shown to the network without any translation (i.e. 'offset': [0, 0]). To view this information, open ‘imagesensor_log.txt’ and search for “compute.”

Since we included a print statement in the ‘target’ parameter of each spatial pooler in params.py, we can see how quickly the network is learning new coincidences. Notice that as the training of level one progresses, the network takes a long time to learn any new coincidences and after 10,000 iterations has only learned 15 coincidences. This means that all of the input patterns that it finds at Level 1 are being grouped together as the same coincidence. Recall that ‘maxDistance’ and ‘sigma’ in the Level 1 spatial pooler were initially set to 500 which, based on the growth rate of the coincidences, is too high. Stop the training process by pressing Ctrl-c and edit the params.py file to lower both of these parameter values to 250. As a rule of thumb, the number of coincidences should be similar to the number of iterations required to find them. The Level 1 spatial pooler should now look like:

{ # Level 1S / 1 'nodeType': 'SpatialPoolerNode', 'size': [16, 16], 'overlap': [0, 0], 'outputElementCount': 500,#200 is better 'clonedNodes': True, 'maxDistance': 250, 'sigma': 250, },

Train the network again by typing python RunExperiment.py architectureDemo/full With this new configuration, the network should produce recognition accuracies of 77.8% on the testing and linesTest data sets, and 100.0% on the training data set. Since the network correctly classified all of the images in the training set, we know that it is performing fairly well, although we should be able to get better results by tuning the parameters further.

When tuning the parameters, it is often a good approach to begin with the lower levels in the hierarchy and work up from there. First, lets check to see how the groups formed by the temporal pooler look. Open the file ‘experiments/architectureDemo/full/visualization/index.html.’ Click on “level2[0,0]” to see the visualization of groups at Level 2 (the temporal pooler). Looking at the first chart, “Group Sizes Histogram”, we see that all of the groups have only one coincidence in them. This means that there are either too few coincidences being detected by the spatial pooler, or that there are too many possible groups and output elements in the spatial pooler, or both. We’ll try to mitigate this effect by doubling the outputElementCount in Level 1 to 500 (to find more coincidences), and reducing the outputElementCount in Level 2 to 200 and the requestedGroupCount to 100 (to reduce the number of groups). Train the network again.

With this configuration, the recognition accuracy on the training set should still be 100%, and for the testing and linesTest data sets, should have improved to 85.2%. Looking at the Level 2 nodes in the visualization file again, notice that the stability at Level 2 has also increased from 65.5% to 81.5%, meaning that many transitions between coincidences occur within the same group. By increasing the value of ‘transitionMemory’ in the Level 2 temporal pooler to 6 we should be able smooth the temporal transitions and thus increase the stability at that level. Train the network again.

Now, the network’s recognition accuracy is exceptional at 92.6% for both the testing and linesTest data set, so we’ll try replacing the Zeta1TopNode classifier with an SVM classifier to see if any further improvement can be gained. For this, we’ll use the code included in the params.py file in '/experiments/fruits/full', but modify it to output 6 categories:

{ # Classifier / 5 'nodeType': 'py.SVMClassifierNode', 'outputElementCount': 6, # Maximum number of categories 'minC': 0.0, 'maxC': 2.0, 'minGamma': -4.0, 'maxGamma': -2.0, 'kernelType': 'rbf', 'numSamplesPerRecursion': 21, 'numRecursions': 2, 'contractionFactor': 0.3, 'numCrossValidations': 5, 'convEpsilon': 0.01, 'useSparseSvm': False, #### try False 'inputThresh': 0.500, 'useProbabilisticSvm': True, 'doSphering': True } Adding the SVM classifier to the network will increase in training time, although the classifier often outperforms the Zeta1TopNode in terms of recognition accuracy and should be tested. With these changes made, train the network again.

For this data set, the SVM classifier produced a recognition accuracy of 85.2% on the data set, which is surprisingly lower. The main parameters that affect the SVM classifier’s performance are minC, maxC, minGamma, and maxGamma. These parameters can be thought of as specifying the 2D range of the search space for classification. To inspect the classifier and verify that these parameters are set to a reasonable range, launch iPython (by typing ‘iPython’ in the Terminal/Command Prompt) and enter the following commands:

from nupic.network import * net=CreateRuntimeNetwork('experiments/architectureDemo /full/networks/trained.xml.gz') svm = net.level5 svm.interpret('str(self._svmParams)')

This will output the values of C and Gamma that the classifier selected as being optimal: Out[4]: (1.7000000000000002, -3.7999999999999998) Since C (the first value) is between minC (0.0) and maxC (2.0), and Gamma (the second value) is between minGamma (-4.0) and maxGamma (-2.0), we can conclude that the classifier parameters are set to a reasonable range. However if, for example, the value of C was equal to minC, then we would need to change the value of minC to a smaller number so that the classifier is able to search further in this direction to find an optimal value for C. The same technique may be applied to cases where Gamma equals minGamma or maxGamma.

To quit iPython, type quit at the prompt.

Since the SVM classifier didn’t provide better performance than the Zeta1TopNode, we’ll try swapping in the k-nearest neighbour classifier and test its performance. To do this, modify the classifier block in params.py and retrain the network:

{ # Classifier / 5 'nodeType': 'py.KNNClassifierNode', 'outputElementCount': 6, # Maximum number of categories }

With the knn classifier included in the network, the performance drops to 81.5%! From this, we can conclude that the best performance (92.6%) can be obtained using the Zeta1TopNode. The next step is to revert to the Zeta1TopNode, and determine which images were misclassified. To remove the knn classifier, simply comment out the ‘nodeType’: ‘py.KNNClassifierNode’ line in the classifier block of params.py and retrain the network.

=Determining which images the network misclassified= Now that the recognition accuracy is acceptable, it will be valuable to report which images the network correctly classified and which images it got wrong. This information is stored in '/experiments/architectureDemo/full/inference/testingDataset_effector.txt'. This file consists of one line for each image in the testing set, where each line is split into columns that correspond to each image category in alphabetical order. Paste the file’s contents into a spreadsheet editor and add the six category labels above the columns. You’ll need to check how many images are in each category of the testing data, and add these as labels to the rows. For example, in the architecture dataset, there are 7 images in the Chimney category, so the first seven rows correspond to these images. The spreadsheet should look like this: Here we see that the first and third images from the chimney category were misclassified (as a round window and window respectively), while the rest were correct.

=Comparison with NuPic 1.5= This experiment was also conducted using NuPic 1.5, and by using version 1.6, a considerable speedup in training was observed, as were significantly better recognition results. When the network was constructed using NuPic 1.5, the ‘exhaustiveSweep’ explorer was used for the spatial pooler nodes, which resulted in training times ranging between eight and 12 hours! By switching to NuPic 1.6 with the ‘MultiSweep’ explorer, training time was reduced to around five minutes. Similarly, by tuning the parameters in 1.6, the network’s recognition accuracy increased between four and thirty-nine percent on all data sets except for the occlusion test. The recognition accuracy of NuPic 1.5 and 1.6, and the difference of the two are shown in the following tables. Here, each cell is labeled with the testing set used, where “linesTest0” refers to the data set that was distorted with lines at difficulty level 0, and occulsionTest2 refers to distorting the data set with occlusions at difficulty level 2, etc. The recognition accuracy (percent) was calculated by creating 50 distorted examples of each image, and dividing the number of correctly recognized images by the total number of testing images.