Wikipedia:Reference desk/Archives/Computing/2021 October 12

= October 12 =

Removing a destructive watermark
The picture to the right has an obtrusive watermark from the cell phone. Unfortunately, it seems that the watermark was added in such a way as to remove all information of the image in its place. So it seems the tips from the related RD query “SOLVED: Removing a watermark from a PDF” are no help here. Obviously information that has been destroyed can't be retrieved. (That's what I'm trying to express with the word “destructive” in the title.)

So, is there at least a way to make the watermark less obtrusive? Ideally, it should be possible to take the information from the edges of each letter and propagate it towards the middle. Is there maybe some program that does that? Alternatively, it should be possible to set its white as transparent and underlay the picture with a blurred version of itself. Compared to the previous method, this would have the disadvantage that it will have a sharp transition at the edges, but at least it would be better than nothing. ◅ Sebastian 14:46, 12 October 2021 (UTC)


 * With a watermark stating what the camera that took it was and from what phone there should be an option in the settings of the phone (possibly camera settings) to disable the watermark. If you didn't take the image yourself then afaik it's not possible to remove the watermark. ― Blaze The WolfTalkBlaze Wolf#6545 14:48, 12 October 2021 (UTC)
 * There must be many guides for removing hardcoded watermarks e.g. [//clippingpixel.com/how-to-remove-watermark-in-photoshop/] [//www.photopea.com/learn/bt-smart]. Many guides are likely targeted at Photoshop, but I'm sure there are similar features in GIMP e.g. [//docs.gimp.org/2.10/en/gimp-tool-heal.html] www.javatpoint.com/how-to-remove-watermark-using-gimp and other tools. There is also a guide on Commons Commons:Help:Removing watermarks although the section on GIMP seems useless (it focuses entirely on how to minimise generation loss with zero mention of the actual watermark removal part). Such tools generally try to guess what should be in the area based on similar regions and so can do a decent job if there are similar regions as there seems to be here although if you look carefully it will be obvious something is wrong due to the vein details and other things. Nil Einne (talk) 16:30, 12 October 2021 (UTC)


 * Cool – “Content-aware Fill” sounds just like what I need. It does sound like something that other, more affordable apps might have, too. I'll look into that. Thanks a lot! ◅ Sebastian 16:46, 12 October 2021 (UTC)

Connect-the-dots methods
Is anyone aware of any methods for solving connect-the-dots problems where the dots are unnumbered (or, to be more specific, their numbering is usually not, but occasionally is, representative of the sequence)? I'm working on a project that involves tracing points shot by GPS, and the numbering of the points on long straightaways is generally sequential, but not always. I've goth algorithms for recognizing multiple-path intersections, distinguishing between branching and intersections and the like, but I'm struggling with a reliable method of finding the endpoints of the various lines.

For clarity, I have asked much the same question on Stack Exchange. I've found countless intro-level tutorials for graphics APIs that use connecting-the-dots problems as a learning tool, but they always pre-suppose that the dots are ordered already. Any help would be appreciated. External links, links to articles about similar problems, etc. 74.175.117.2 (talk) 17:48, 12 October 2021 (UTC)


 * Is the Travelling salesman problem any help? It might give you different solutions, though, since it really uses no numbering information. ◅ Sebastian 17:55, 12 October 2021 (UTC)
 * Hmmm, it may very well be. My current tact is a bit of a brute force method: I'm running the collection through multiple methods of assigning weight (as an endpoint, intersection or part of a linear route) and then weighting those results based on some basic statements about the collection (basically, the proportions and angle of a rectangle that encloses all of the points in a minimal volume, the actual numbering of the points and some other factors) to get a list of candidates for the endpoints and the second-in points, along with a confidence rating.
 * It's working, but it's not working that well. So far, the best we've gotten is about a 70% confidence on the actual end-points, and in the most problematic datasets, that drops down to about 50%.
 * I greatly appreciate the suggestion! It's just me and one of my programmers working on this (I've got the others on a more pressing issue), so we hadn't considered whether a TS-heuristic might help. I'll be looking into that straight away. Thanks again. 74.175.117.2 (talk) 18:16, 12 October 2021 (UTC)
 * How many dots do you typically have? For moderate numbers, finding an exact solution to the TSP while pruning the search using branch and bound may be doable. --Lambiam 19:59, 12 October 2021 (UTC)
 * It varies a lot. The simplest datasets consist of 1-6 points, the most complex can contain a few thousand (the largest has about 10k, but that one is a major outlier, the typical "very large" dataset will contain 2-3k at most). The simpler ones are more common, such that the average of all the datasets I'm using for testing (which I picked to be a representative sample) is about 120 points. Thanks for that link, by the way. Looks very useful! 74.175.117.2 (talk) 13:46, 13 October 2021 (UTC)
 * Are these points timed? This looks very much like the problem regularly solved by Radar trackers, with the two sub-problems of track initiation and plot-track association. The way this is normally handled is that plots are associated with an existing track if they are "close enough" to the expected position (where prediction to the expected position is usually done with a Kalman filter, or a modern variant of it). Among unassociated plots, any reasonable 3-plot sequence becomes a potential track, to be supported (or not) by future plots. There are some caveats, of course. First, this is for live updates - if you have a static set, there are probably better algorithms. And secondly, the Kalman filter encodes certain constraints about the development of the track - if your points don't behave with a certain regularity, this will fail. --Stephan Schulz (talk) 21:00, 12 October 2021 (UTC)
 * Are these points timed? Yes, but the timestamps of the points would correlate with the existent numbering, which I mentioned above is unreliable. In any case, the timestamp is not included in the dataset per se (though I can do extra work to get it). Our datasets normally (but not always, because life hates me!) conform to the FDOT ITSFM standard which, as you can see in the link, doesn't include the timestamp.
 * The radar tracking method is very similar to one of the weighting methods used (this one for tracing through the existing points after the endpoint has been located) which works very well: It takes a point and "predicts" that the next nearest point that hasn't already been traced over will be within +- &#960;/8 radians of the angle from the previous point to this one. If not, it then gets the angle of the nearest point, and looks forward, to ensure that it's actually reached a corner, and not just a spot where an unrelated point (part of a branch or different route) was nearer to it than the next point on this route. This method itself is weighted very heavily, because it works so well for ordering the points, once the endpoint has been established. 74.175.117.2 (talk) 13:46, 13 October 2021 (UTC)
 * Can you reveal how these datasets are generated? A method that works fairly well for a homing pigeon may fail completely for a rat. Can the track self-intersect? Can it have cusps? Is the agent on the move trying to get from A to B? Are there constraints on its possible movements? --Lambiam 13:01, 14 October 2021 (UTC)
 * Sure. They're ESRI Shapefiles of GPS points shot by a surveyor of underground utilities (electrical, telephone, cable and fiber-optic data) whose attribute tables are formatted per the link in my last response. There are some restraints, but they're not very useful:
 * -So sweeps (turns) will always have either a 2', 4' or 12.5' radius. The type of utility (electrical cable, coaxial cable, fiber optic cable, etc) is known and will determine the radius.
 * -There will be points marked as conduit access points which are part of the route, but are generally offset from the straight-line path between two basic route points by at least 1', and those conduit access points will be spaced no more than 800' apart, and no less than 2' apart.
 * -The routes are generally structured like other utilities: a main (trunk) route from one site to another, with branches (drops) along the way, though branching and splitting of the trunk is also somewhat common.
 * -Any intersection that does not occur at a conduit access point indicates two unconnected routes (they usually will connect, but somewhere else).
 * -Any intersection which occurs at a conduit access point indicates either a drop or a trunk branch splitting off.
 * -About 60% of the routes will be laid out on a framework defined by the roadways, with any trunk branches separating at major intersections. This isn't writ-in-stone, and we regularly see datasets where there are no roadways along the route, or only limited ones.
 * -About 80% of drops will occur outside of a roadway intersection, but within about 150' of one. These are usually the drops to provide power & data to traffic facilities like traffic lights and vehicle detection systems. About 15% will occur no-where near an intersection, and these will be drops to buildings and other permanent facilities. The last 5% occur right at an intersection.
 * -Overall, the terminations of trunks are almost always at buildings, but the vast majority of jobs are network expansions and utility servicing, meaning the endpoints we get are usually at a conduit access point, but are sometimes just directly buried under a utility flag.
 * I've used some of these restraints to get pretty-darn-good results in laying out the job, once the endpoints are known. It's really finding those endpoints that's the trouble. For a human, finding them is usually quite simple. Even in the worst cases, it only takes 2-3 guesses to figure out out. We've spent a lot of time discussing what it is we're perceiving about the overall pattern that makes it so easy for us to figure it out, but neither me nor my programmer have psych degrees, so we've yet to have a eureka moment on that. 74.175.117.2 (talk) 13:37, 14 October 2021 (UTC)
 * I assumed before that you were trying to construct a polyline, which has two endpoints, but now I get the impression the result aimed at is a tree whose vertices are embedded in the plane, which can have many endpoints (external vertices). If so, could you use an algorithm for constructing a minimum spanning tree? --Lambiam 20:15, 14 October 2021 (UTC)
 * That's probably not a perfect solution, because the idea is not to make a branching tree that has the minimum number/length of segments (in fact, in many cases that would be counterproductive), but it's an interesting approach with some potential that I'll be exploring, along with TSP methods, so that you! 74.175.117.2 (talk) 20:38, 14 October 2021 (UTC)
 * Edges can have different weights, where you minimize the total weight. For example, you can also take the length of an edge (Euclidean distance between its nodes) as its weight. Or you can take the squares of these lengths, which may work better in your case. If you have some plausible way of assigning an independent likelihood of forming an edge to each pair of nodes (a number between 0 and 1), then you'll get the maximum likelihood spanning tree by using the logarithms of these likelihoods as the weights – never mind that they are negative. (In reality, of course, these likelihoods are not independent.)  --Lambiam 08:07, 15 October 2021 (UTC)