User:MikeTernoey/sandbox

Computing the Instructions for Building A Vehicle with OCR

Imaging the set of images in your facebook account that contain all of your friends, but the least number of strangers (noise) in them, or people you don’t know, and with the constraint that only three friends may appear per picture, and none may appear more than one time.

Mike Ternoey is the copyright owner of the first software application used at Navistar used to write the instructions for assembling an entire vehicle program – using Solid Works Inspection, OCR, and extensive programming. Coincidentally, he was a classmate of both of Roger Penske’s sons at the Lawrenceville School, in New Jersey. Penske orders nearly 50,000 vehicles per year for their international fleet of more than 350,000 vehicles.

His assignment as Build Analyst to Navistar required him to write the instructions for more than10 vehicles, with than 25,000+ parts, in a one month period. This position required writing thousands of pages of assembly instructions per month – executed in a C# / XAML view of a SQL database. The software consisted of three components:

1.    ) A file downloading Macro

2.    ) A Solid Works Inspection library written in Java

3.    ) An Algorithm That Selects Images from A CAD Database, and assigns each part in the vehicle to an image in a sequence – US COPYRIGHT REGISTERED SOURCE CODE

4.    ) MISC Items: The source code also parses data and notes from engineers.

PROBLEM SCOPE

Automation of Large Assembly Instruction Manuals (2000 + Parts per vehicle)

A logistics company may order up to 50,000 vehicles per year, most with a lifespan of 5 years.

Every vehicle is designed on a critical path, with design changes, that happen late in the project as the project matures. A single part may not be cleared for the prototype production because of a design flaw, until a few weeks (or days) before the prototype build date.

A donor vehicle is selected for the blue-print of a given project, and design modifications are made to the blue print for the donor vehicle, whereby some parts and features are added to the vehicle and some are removed.

If a part is added to the vehicle build it can change the instruction manual for building the truck drastically, much like shuffling an entire deck of cards.

The engineer who designed a particular assembly group, with a patent or trade secret implemented, in the truck may be deceased, unavailable, or the company may have been acquired in a merger.

Each instruction manual is nearly a thousand pages long. The number of possible combinations of images in the instructions for a given vehicle with 2,000 parts is more than the number of molecules in the universe.

Classifying 2000 Parts in a Bill Of Material (for a vehicle)

Vehicle - Each part in a vehicle can be classified by these four sets:

1.) Assembly Group – There are many assembly groups in a vehicle, such as chassy, electrical, wiring, and cab, steering, breaks, engine.

2.) Installation – an installation is a part of an assembly group, and there are usually less than ten in an assembly group.

3.) Item – A small assembly of screws or washers, and a few other parts. In other words, it is just a subassembly of an installation.

4.) Part Ex: A screw, or washer, metal component, like a rib. HOWEVER, 90% of a vehicle is screws, bolts, or washers - that are specialized.

CAD database design:

Vehicle > Assembly Group > Installation > Item > Part

Same As

Assembly > Subassembly > Subassembly > Subassembly> Part

Search Algorithms for CAD Databases:

To write the instructions for an entire vehicle, one must be able to locate thousands of parts in  the CAD database that are very hard to find because they are not labeled.

Since there are only 30 labels that can be placed in a CAD image for labeling parts, because of clutter, the individual parts in a CAD picture can’t be labeled. This means only “items” can be labeled, because there are too many parts.

The item is the lowest level of resolution available for the CAD image. Only items can be labeled in CAD images but not their detailed parts.

Example of Search Algorithm Issues in Large CAD Databases:

Here we imagine that we have two items from the Hydraulic Assembly Group, from the first of three possible installations in that assembly group. The vehicle we are building at the moment inhabits the database with some other vehicles that share similar CAD images. The vehicle we are building at the moment is inheriting some design aspects from previous generations of vehicles.

[Item 001] – THE UBIQUITOUS ITEM

Old Bolt Part # 7896544, Old Screw #686990, Old Strap #79700-7, Old Tube #47504-6-7,

Old Panel Component #457986-6

[Item 077] – THE RARE ITEM

Unique Part # 7896544, Unique Screw #686990, New Strap #79700-7, Special Tube #47504-6-7,

Production Part #457986-6

UBIQUITY

The label itself, “001”, is found in ALL CAD images in the databases because it’s the first number that can be stuck into a CAD image as a descriptor. So, it is useless to search for it in a query.

Because assembly groups, and sub assembly groups, are cloned for new projects, items, or parts, labeled "001" appear in almost every image, and it is nearly impossible to identify which image in a cad database should show item "001" from a particular vehicle project.

RARENESS

Item 077, on the other hand is something new on this custom vehicle, and it has a production part. Item 077 can be found easily because there aren’t many items with numbers that high.

For example, if "item 1" is in 20 pictures, but "item 77" is in only 2, item 77 is rare.

Question:

There may be up to 20 old CAD images from donor vehicles that contain an item 001 in this section of the database, for this item. Which is correct?

One can image that when there are 1000 items in a Bill of Materials that need to be mapped to correct images in CAD databases, in just a few hours, the need for specific algorithms is appropriate.

PROBLEM 1: SOLUTION TO FIND ITEM 001

So, the solution is not to look for item 001 in the data base in a query. The solution is to throw out images that don’t contain item 077 – or the numerous other items unique to the particular vehicle BOM.

IN QUERY FORM:

Select ALL of the images in the CAD database that contain item 001 AND item 077.

PROBLEM 2: SOLUTION TO WRITE INSTRUCTIONS FOR ENTIRE VEHICLE

1.    Count the number of times the item appears in the B.O.M. and in the CAD database in a given installation. This is the RARENESS SCORE.

2.    Score each image in the CAD database.

3.    Exclude images in your search that have items (or Solid Works Inspection Baloons) that are not in the particular vehicle B.O.M.

Deselection: Which images should be used in the instructions?

An image should not be used if a large percentage of items in the image are not really in the assembly that is being built for the current project.

Example: TABLE 1 – B.O.M. TABLE 2 - SolidWorks Inspection Scan

Note that item 001 appears in 11 images in the CAD Database, and we only have 7 of those images appearing here.

TRANSFORMING THE INSPECTION SCAN Scoring the Set of Images (Part A)

“Zero Set” Score

The zero set is the set of items in an image that are not from this particular vehicle program. For example, if we have an item 101, that is not on our list of parts in the Bill of Material, for this vehicle program, then it should go into the zero set for that image.

This indicates the image is corrupted, or it is part of some other project that looks similar, and we don’t want it in our instructions.

'CRITICAL POINT: If these images are selected for this portion of the vehicle, then the zero set becomes [20,21,77,101]. When images are selected, the smallest zero set is best, because these items don’t appear to be a part of this vehicle build, and are really a part of some other project – or are related to error, or changes in design.'

Redundancy (AND Length)

If there are too many pictures the mechanics tend to get confused because items and parts appear to be added twice. Alternatively, if the number of images is too small, then too many parts are added in each image.

Ranking AND the Sequence of Images (Part B)

After a set of optimal images is selected for the instructions of this particular part of the vehicle, then then the items should be placed into the first image with the highest rareness score.

Rareness

An image is rare if it contains items that only occur a few times in the CAD database. I4 above is rare, because the item 78, and Item 79 is only in one or two images in this section of the database.

Clutter Score (Items)

An image is cluttered if there are too many items in it, and since it contains more than 6 items, may be it should not be used as opposed to one that contains 4 or 5 items.

Relevance Score

If the percentage of items in the image that are not in the B.O.M. is high, then the image should not be used, but the optimization of the Zero Set takes care of this.

Optimal Image Sets for a Sequence of Instructions

1.    The set of images contains every item in the B.O.M. for this vehicle program, AND that has the smallest (ZERO SET).

OR…

“The set of images has the least number of items not a part of this vehicle program.”

WITH:

The Items assigned IN SEQUENCE to the “rarest” image they can appear in.

CONSTRAINT:

The set of images with a maximum of ITEMS/PARTS per image, when proper sequencing is applied to the above selected set.

If the solver doesn’t converge on a particular sequence of images then, these constraints are applied:

2.    The set of images with the least “clutter”.

3.    The set of images with “redundancy”.

UNLESS, the installation is intended to be redundant, because it contains straps, tubes, or clips that extend across the vehicle. This would be found in the breaks and hydraulics for a truck.

Specialized Parts:

Clips, straps, and hoses in electrical sections must be assigned to their proper images in the CAD database, by the engineer who designed that assembly recently.