Wikipedia:Reference desk/Archives/Computing/2020 February 24

From Wikipedia, the free encyclopedia
Computing desk
< February 23 << Jan | February | Mar >> February 25 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


February 24[edit]

Download all images on webpage[edit]

I'm using a long, long webpage containing around 300 images, each one on a new line. They're the pages of a book so the order is important. Is there a tool I can use to download them all at once but keep them in order (eg by naming them 'image1', 'image2' etc) so I can then form them into a PDF that reads like the book? Most of the 'download all images' add-ons I've found just give arbitrary names which isn't helpful. Amisom (talk) 07:40, 24 February 2020 (UTC)[reply]

If it's someone else's site, this is called web scraping and there's no one-size-fits-all method or tool. You often end up developing a custom script, though that is pretty quick once you've done a few. It is harder for some sites than others. Can you say what site it is? 2602:24A:DE47:B270:A096:24F4:F986:C62A (talk) 09:09, 24 February 2020 (UTC)[reply]
It seems odd that a scraper would rename the files, though. It's very likely that the pages are already named in the correct order, even if the names themselves might seem a little weird, so it seems the easiest solution would be to find the option that's causing the renaming and uncheck it. Matt Deres (talk) 14:33, 24 February 2020 (UTC)[reply]
As the IP said, it's certainly doable - if the web browser can display them in order, then they can be retrieved in order. It might be worth to take a look at the web sites source code or just at the URLs of some of the images. Often one can then guess the pattern and just use wget or cURL to grab everything. Of course, with modern web pages importing about a billion frameworks to display "Hellp World", it may be more difficult... --Stephan Schulz (talk) 14:53, 24 February 2020 (UTC)[reply]
"Hellp World" indeed! (:  --Lambiam 16:57, 24 February 2020 (UTC)[reply]
Is there such a thing as a Freudian typo? ;-). --Stephan Schulz (talk) 17:20, 24 February 2020 (UTC)[reply]
  • This is a programming task, and not a difficult one (It's the sort of thing I'd teach as an example in a 'Learning Python' course for older schoolkids). There are three broad approaches:
  1. Find a gadget that already does it, as a command-line utility, such as wget.
  2. Find a browser extension that does it. Often more convenient. There are many of these and they're hard to keep up with. "DownThemAll" [1] is one I've used for FireFox.
  3. Write some code. In a modern language (like Python) there are many pre-existing modules to do the hard work of this, even things like Beautiful Soup which will handle awful markup. There are also many examples of how to do this, pre-written for almost exactly what you need. This will be the most flexible way, especially for a repeated task, and may not require as much effort as you think. Andy Dingley (talk) 17:29, 24 February 2020 (UTC)[reply]
When you say they give "arbitrary names", I suspect those are the actual file names the Web page uses, meaning the downloaders aren't renaming them. Many Web sites, especially ones that are dynamically generated, can use things like Base16 encoding for filenames; if the file names are some alphanumeric gibberish, that's why. To find out, try right-clicking on one of the images in your browser and looking for something like "View Image Info" (that's what it's called in Firefox). Or, you can look at the HTML for the page with the "View Source" option. If this is the case, then you actually want the files renamed. Some download managers can do things like name each file in sequential order ("1.png, 2.png, 3.png"). You have to look at the documentation for whatever tool you use to figure out how. Also, not important, but making a PDF will just give a PDF with the image files embedded. This seems kind of pointless to me, although I suppose it could be useful in some cases, like if you have reader software you like using and it only supports formats like PDF. --47.146.63.87 (talk) 20:29, 24 February 2020 (UTC)[reply]

Registry entry[edit]

I consider Kasperskiy a Russian spy malware. At one point I helped a friend of mine to get rid of it by expunging the registry, but he installed it previously since it was offered for free. Spyware is always free as we know. Today, much time later I decided to check if my registry might have a trace of it. My OS is Win 10 Pro. This is what I found:

HKEY_CURRENT_USER\Microsoft\Windows\CurrentVersion\Internet Settings\ZoneMap\Domains\Kasperskiy-antivirus
HKEY_CURRENT_USER\Microsoft\Windows\CurrentVersion\Internet Settings\ZoneMap\EscDomains\Kasperskiy-antivirus

Is there any harm of having it there? That Domain section has hundreds of domain names, including variations of my favorite SpyBot Search and Destroy, like spybotcom.com, etc.

Thanks, - AboutFace 22 (talk) 18:29, 24 February 2020 (UTC)[reply]

For the allegations against Kaspersky Lab, see Kaspersky bans and allegations of Russian government ties. As long as not more specific information is supplied, I am far from convinced by the spyware allegations. The registry entry looks suspect to me insofar as it uses the spelling "Kasperskiy" while the company itself uses the spelling .  --Lambiam 19:50, 24 February 2020 (UTC)[reply]
including variations of my favorite SpyBot Search and Destroy, like spybotcom.com… Looks like you have or had malware on your computer. Those entries are Internet Explorer/Microsoft Edge's list of "Trusted Sites", which are allowed to bypass the browser's security controls. Those domains are "fake" domain names meant to look like legitimate ones (hence also the "Kasperskiy" misspelling). Malware would have put them in that list as part of hijacking the browser. If you're confident they're from a past infection that was removed, just delete the entries. It's always good practice to make a backup before editing the Registry; you can do this from within Registry Editor. --47.146.63.87 (talk) 22:52, 24 February 2020 (UTC)[reply]

Then why to remove only Kaspersky. Look at the small part of my registry:

Registry Entries

What shall I do? There are hundreds of them. AboutFace 22 (talk) 23:11, 24 February 2020 (UTC)[reply]

Yes I also have the kasperskiy-antivir.com entry. It's not Kaspersky's real domain, it's a fake download page to trick people who don't know which spelling is used by the company. The registry key containing all these malware domains is a list of stuff that goes in IE's trusted and untrusted zones. These are standard untrusted zone entries and contain a 0x04 value meaning that the domain in the subkey belongs to the untrusted zone. To put it in clearer's terms, it's something between an ad blocker and a hosts file for IE's internal use, and it's been in Windows since at least XP. Removing these entries might leave you open to malware but if you don't use IE then it doesn't really matter if you remove it or not. Lastly a disclaimer: if you consider Kaspersky Russian government spyware, it shouldn't come to you as a huge surprise that many consider Windows 10 commercial spyware and adware (which I think is in terms of dollars at least, a lot worse than getting spied on by a country you'll probably never even visit). 93.136.117.148 (talk) 00:01, 25 February 2020 (UTC)[reply]
Ugh, okay, I jumped to a conclusion. I assumed IE/Edge did something similar to other browsers for malware protection, downloading an internal list. Thanks for correcting. --47.146.63.87 (talk) 06:27, 25 February 2020 (UTC)[reply]

Registering and maintaining a domain name costs money, sometimes a lot of money. Who is paying for all those untrusted domains? AboutFace 22 (talk) 16:15, 25 February 2020 (UTC)[reply]

Not for a .com. Anyone doing this sort of stuff is probably paying less than $1 a year. So depending on what they're being used for, one successful scam victim can probably pay for tens to thousands of domains or maybe even more. Nil Einne (talk) 07:35, 26 February 2020 (UTC)[reply]