Talk:Internet Archive/Archive 3

Moving image collection
Computer Chronicles https://archive.org/details/MainFram1984 Mainframes to Minis to Micros (2/12/1984) HiRes MPEG4 https://archive.org/download/MainFram1984/MainFram1984_edit.mp4

What a horrible encode you have made of this, I guess rest of the whole moving image archives will be the same. You need to look again at the encoder settings and redo every transcode you have done so far.

Needed better picture quality that matches the source video. For this it is the mpeg2 video the mp4 transcode is 30-50% less than the mpeg2 video. Think of using an intermediate avi file between the decode and encode

mpeg2 > avi lossless > mp4 means lose less picture quality. That is given you have good settings for the final encode to h264

The audio is very poor so bad that that is scratchy or maybe kind of slurred overall poor audio quality (AQ). AAC is part of the mpeg audio family and is from same stable as AC3 so the origianl mp2 should not suffer encoding it to aac it should remain the same. You need to increase the bitrate by a large amount to equal the source audio quality.

mp2 > wav > aac or mp2 > aac

remember you need to to set and use higher video and audio bitrates to maintain the original mp2 audio quality. The same goes for the video with many other obvious tweaks needed for the encoder settings. Seek help you obviously need it. Better to fix all video mp4 recodes now than later.

ATM this MP4 video % compared to the original mpeg2 video is 40% it should be at least 90-95%+ video quality ATM this MP4 audio % compared to the original mpeg2 video is 10% it should be at least 90-99%+ audio quality

You need to use better encoder settings than you use now to help preserve the video. Preserving is what the internet archive is about right ? So why are you making so bad mistakes like this in doing so. As made this obvious mistake which is easily spotted how many others are you making ?

Suggestion if you really want to preserve the recoded video forget FFmpeg x264 it is poorly maintained or is hopeless maybe reason you video convert is horrible. Look at x264vfw or x264 codec to encode with.

Suggest for convert of the original mpeg2 to h264-acc.mp4 deinterlace keep the bottom field and delete the top field. Resize the video to the original resolution as it is now (mpeg2)720x480 it will get rid of the interlace so make the picture cleaner and sharper. This works for these videos from original mpeg2 file unknown if the same for all Computer Chronicle episode videos. Use Something like x264vfw Slow-none-high-4.1 with single pass ratefactor-base (CRF) 18 (at minimum). Preserve also the audio quality listen and judge to know if the audio quality is the same and not worse which is now far worse. Nothing worse than bad PQ and AQ for video and audio

Do keep encoding video to hires mp4 with better video settings and bitrates for video and audio. Sure we can download the original but why need to when the recode is as good as the original. Until then I won't bother would if had a better download net connection until then I won't bother with any of your videos collections or videos. Better to download quality over rubbish which is what your mp4 videos are now.

Do update you main site to say for a long while the video have been recoded and are 90% or more better than they were originally. To let us know you have done this.

Further to this it would be better if you added episodes in each year in a way that we understand episode listings.

Episode name - Season ## - Episode ## - Title

examples . Computer Chronicles - S02 - E01 - Computers Run Amok . Computer Chronicles - S02 - E02 - Computers Break Out . Computer Chronicles - S02 - E03 - Computers Take Over . Computer Chronicles - S03 - E01 - Computers Fightback Terminating Everyone . Computer Chronicles - S03 - E02 - First Terminator Voices 'I Will Be Back' . Here is a list of all Computer Chronicles episodes but unsure how accurate it is or of any resources there http://stquantum.xtreemhost.com/cc/content/episodelisting.htm

Having each episode listed like this also for the episode name. Means easier to navigate to what people require. Also as new videos are added easily for people to spot if they like to see or have seen the episode. This naming scheme work with any program and episode, do this and the file names makes life for everyone simple as abc. — Preceding unsigned comment added by 78.150.253.167 (talk) 19:25, 22 March 2014 (UTC)

92.26.180.120 (talk) 04:25, 22 March 2014 (UTC)

Banned in Russia
Since 24 Oct 2014 Internet Archive (web.archive.org) is banned by Russian authorities. It should be added in the article. — Preceding unsigned comment added by 5.167.173.119 (talk) 09:56, 24 October 2014 (UTC)

Non-controversial sub-section removed from Controversies section
Removed sub-section:

Removal of Citizenfour Documentary

The Internet Archive removed the listing of a documentary about Edward Snowden, called CitizenFour "due to issues with the item's content."

Reason:

1) Reference provided doesn't support the assertion that the removal is or was controversial. 2) The removal was not and is not controversial, except in one editor's head. 3) The Archive also doesn't host Hollywood movies or other copyrighted non-public content. What's your point here?  4)  Anyone can have their content removed from the publicly facing archive by simply throwing up a robots.txt, at any time, without warning or notification to anyone at all. That's how it works. 5) In this case, we really don't know whether the item removed was the documentary, or some other content keyed to "LauraPoitrasCitizenfour" (the trailers are still up at IA).  It doesn't matter, IA's TOU are posted, and state when they remove stuff.

I think that trying to create a controversy where none exists takes the encyclopedia substantially backwards, not forwards. &mdash;Aladdin Sane (talk) 02:08, 26 March 2015 (UTC)
 * Thanks for removing the content. It should have been removed because it is self-published. Wikipedia does not allow articles on organizations to cite publications by that organization, and in this case, the source cited was only something self published in a public place on that website. If third-party journalists write about something then it can go here, otherwise it stays out.  Blue Rasberry   (talk)  15:07, 26 March 2015 (UTC)

stub childs
If anyone wants to lend a hand, I've redirected the following stubs here: RECAP US Federal Court Documents (collection), Microfilm (collection), Universal access to all knowledge, and NASA Images; there's more at Internet Archive's Children's Library, American Libraries (collection), Canadian Libraries (collection), and US Government Documents. Thanks. fgnievinski (talk) 06:47, 15 September 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 11 one external links on Internet Archive. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Corrected formatting/usage for https://archive.org/about/index.html
 * Corrected formatting/usage for http://www.archive.org/collections/index.html
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/index.php
 * Corrected formatting/usage for https://archive.org/

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 20:03, 31 March 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Internet Archive. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Corrected formatting/usage for http://www.archive.org/sciam_article.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 15:15, 1 April 2016 (UTC)

Internet Archive & Wayback Machine servers are s-l-o-w
I have 100/100 Mbps fiber optic service. Internet Archive and Wayback Machine are some of the most frustratingly slow connections of all web connections I make on a daily basis. Sometimes I also get a message from WM that a webpage is not available -- and I'm looking at it with another browser. It's as if IA servers (particularly on weekends) are operating on dial-up time. 100.32.106.189 (talk) 13:28, 30 January 2016 (UTC)


 * The Archive operates on a shoestring budget, with a chronic deficit of manpower, and its charter prioritizes the preservation of information foremost, not so much making that information convenient to access (though it's slowly getting better at that). Its data clusters are designed for highly economic storage and the ability to retain data despite hardware failures.  Expect access to remain slow.  Performance is just not a priority for its extremely limited funds and engineer-hours. TTK (talk) 21:51, 15 April 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Internet Archive. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Corrected formatting/usage for http://nasaimages.org/

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 08:35, 2 July 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 4 external links on Internet Archive. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Corrected formatting/usage for http://netpreserve.org/about/memberList.php
 * Corrected formatting/usage for http://www.mtv.com/
 * Corrected formatting/usage for http://chronicle.com/wiredcampus/index.php?id=2235%3F%3Datwc
 * Added archive https://web.archive.org/web/20121111124412/http://www.nasaimages.org/ to http://nasaimages.org/

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 03:27, 12 April 2017 (UTC)

Please Update this article with these new figures
Hi, I'm the Director of Partnerships at IA. I noticed there are a lot of old facts and figures in this article. Here's a source with up-to-date information: https://archive.org/about/

For instance in 2017 we now have 30 petabytes of data.

Some good secondary sources (that were requested) include: Medium: "Never Trust a Corporation to do a Library's Job": https://medium.com/message/never-trust-a-corporation-to-do-a-librarys-job-f58db4673351

The New Yorker--Jill Lepore's "The Cobweb: Can the Web be Archived?" http://www.newyorker.com/magazine/2015/01/26/cobweb

Thanks for helping to make this more accurate.

best, Wendy Hanamura — Preceding unsigned comment added by Whanamura (talk • contribs) 21:39, 15 April 2017 (UTC)

Robots.txt to be ignored
It's unclear at this time exactly when this will apply to sites other than government ones, but archive.org have announced in their blog that they are "looking to do this more broadly" https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/

It may be worth mentioning this where the article currently gives a false sense of privacy in saying that robots.txt is obeyed. 51.6.114.17 (talk) 19:09, 24 April 2017 (UTC) >

> Just read this, I'm not sure. Neilc314 (talk) 05:34, 23 May 2018 (UTC)

Gifcities
Is Gifcities notable enough to make a section about it? --Nutshinou (talk) 12:13, 3 September 2018 (UTC)
 * As much as I love it, probably not. Isn't it enough to describe it as part of the Geocities archival (which is perhaps most relevant for ArchiveTeam in a way)? --Nemo 18:38, 3 September 2018 (UTC)

Welp, that's it.
2 days of it not being up and running on PC, it's safe to say that the Internet Archive is done with. We've officially lost the world's largest internet archive site. F. CappyKid64 (talk) 17:20, 2 January 2020 (UTC)
 * I've been using it all day. The Archive is up. lethargilistic (talk) 17:24, 2 January 2020 (UTC)
 * I've just learned that Microsoft Edge is the problem here. It works fine on Chrome. CappyKid64 (talk) 17:26, 2 January 2020 (UTC)
 * https://www.getgnulinux.org/ HTH, Nemo 21:11, 2 January 2020 (UTC)

Links to in-copyright books hosted on archive.org
I'm starting to run across bibliographies on Wikipedia that contain links to archive.org to download a copy of a book. Nearly always, the books are new enough that they are still covered under copyright Should these seemingly WP:COPYVIOEL links be allowed on Wikipedia?

For example, seven of the books on Seth Godin include links where a copy can be downloaded from archive.org. All of this author's books are copyright.

If there is an exception to Wikipedia's copyright policy that allows for links to in-copyright books to be hosted on archive.org then I believe there should be a Wikipedia help page containing the supporting rational for this exception and that article would be linked to from articles such as Internet Archive, WP:C, WP:CP, etc. --Marc Kupper&#124;talk 22:01, 16 January 2020 (UTC)

Also see Open Library. --Marc Kupper&#124;talk 22:01, 16 January 2020 (UTC)
 * The Wikipedia Copyright policy on linking out to copyrighted works does mention generally that the status of Internet archives is "unclear." The position of the Internet Archive is that posting the books falls under fair use, and they comply with all takedown requests as required by law. Additionally, the Internet Archive plans to scan every book used for reference by Wikipedia articles. It doesn't serve Wikipedia's interests to pre-emptively consider it copyright infringement when it has not been hashed out by a court anywhere. Any change of the current practice would be cutting off Wikipedia's nose to spite its face, tbh. lethargilistic (talk) 22:19, 16 January 2020 (UTC)

National Emergency Library
On March 25, 2020, the Internet Archive has launched the National Emergency Library which is defined as "a collection of books that supports emergency remote teaching, research activities, independent scholarship, and intellectual stimulation while universities, schools, training centers, and libraries are closed."

This information is sourced also by insider.com as a secondary source.

I think it is related to some previous topics opened in the current talk page. It is the first time for such an initiative in the history of the Internet Archive. Its duration is also relevant since it will be operative at least untile on 30 June, if the emergency state law won't be deferred for a second time.Micheledisaveriosp (talk) 12:29, 26 March 2020 (UTC)
 * Yes, I think we can wait for a few days until the sources settle but there are sufficient sources already, for instance NPR, Vice and various others      ; also in other languages/countries it fr ph pl de my. Nemo 08:28, 29 March 2020 (UTC)
 * there was a post in the Internet Archive's blog with many of those sources. Publishers' associations are charging with piracy the National Emergency Library (sourced here) and it prevents the extention of similar initiatives to other websites of public interest such as DOAJ which has the largest collection of high-quality and open access scientific papers existing in the world. While libraries are closed for the coronavirus and are the unique subscribers allowed to have a payment access to the whole database, researchers and physicians -expecially of Third World countries- are deprived of such an important source for their studies and experimental therapies. A full open access approach can meet their limited economic sources. But this would be another chapter of the saga. I think we can integrate a concern of the National Emergy Library into the WP article just when we have the first probable legal claim definetely solved. Best regards.Micheledisaveriosp (talk) 08:50, 31 March 2020 (UTC)
 * This talk page is not a forum to express your personal opinion on the topic. I'm not sure what was your point but you seem to be saying that the information should not be added until there is a lawsuit, did I get it right? There is no need to report on the article about every adjective everyone has used about this or another initiative of the Internet Archive (Smith said "gorgeous", Doe said "criminal", blablah), the article needs to stick to reliable sources and avoid fringe views. Nemo 12:47, 31 March 2020 (UTC)
 * DOAJ may be interested by the same issue in the middle term but we don't have a sphere on the future. You get right. As you said, there exist a lot of reliable sources for the Wp article. You are a more expert user of WP and if you agree that we can add a concern now, then I think we can proceed.Micheledisaveriosp (talk) 20:21, 31 March 2020 (UTC)
 * If there isn't significant reporting of said concerns, there's no need to mention them. The NPR article doesn't count because they were bullied into "balancing" their previous article. Nemo 20:24, 31 March 2020 (UTC)
 * , it's not a good faith edit. The anonymous edit was a mine test. The discussion page of the article remained at least a day without the last edit visible, even if nothing in the related chronology showed it was put under approval nor rejected. It was a vandalism but I experienced it can also happen for edits made with the creation of new portions of sourced text and by hand of autoconfirmed users. Wikipedia has no censorship, but the practice is far different. In the Italian Wikiquote anyone can delete the discussions you have created, even to move doubts on the reliability of a new source or if a single quotation shall be integrated into the article. Discussion are uniquely partecipated by the website administrators. But this is not the mater of the current topic. I think it can be hopefully deleted since it didn't produce an improvement of the WP article nor a partecipated discussion. WP is not a forum between me and a couple of other editors. Have a good journey on Wikipedia.Best regards.Micheledisaveriosp (talk) 00:07, 14 April 2020 (UTC)

Edit Request - National Emergency Library
I noticed two fact errors and a possible neutrality issue in the National Emergency Library section. Please excuse any formatting issues in the below requests.


 * Information to be added or removed: Edit last sentence to read as follows and move it up in the text to not imply it was in response to critics: “Internet Archive allows authors and rights holders to submit opt-out requests for their works to be omitted from the National Emergency Library.” Change cites to Internet Archive or other that has this information correct.
 * Explanation of issue: The statement that the opt-out was in response to criticism is factually inaccurate. The opt-out was permitted from the start of the National Emergency Library. The Wired article cited was mistaken on this point.
 * References supporting change: https://blog.archive.org/2020/03/30/internet-archive-responds-why-we-released-the-national-emergency-library/; https://help.archive.org/hc/en-us/articles/360042654251

Surf314 (talk) 20:05, 23 April 2020 (UTC)
 * Just want to acknowledge that citing to the Archive's sources may be frowned upon, but the Wired article was indeed very factually inaccurate on this point. It wasn't just the misattributed reason for the opt-out; the same section went on to say that "If the Archive can’t, by default, treat its scan of your book as its own copy to loan, its collection will dwindle to almost nothing," which is, frankly, nonsense because the Archive serves public domain books independently of this. It's not a quality source on the matter. I would suggest either correcting the timeline by putting the opt-out mention with the initial rollout (and perhaps mentioning confusion about the issue) or at least changing the concluding sentence to say "the Archive provided an opt-out system for authors to use when they released the NEL." It's probably also relevant that the Internet Archive never required DMCA requests. They set up an email account. lethargilistic (talk) 21:40, 23 April 2020 (UTC)

Surf314 (talk) 20:05, 23 April 2020 (UTC)
 * Information to be added or removed: Edit the following sentence - “normally, the site would only allow one digital lending for each physical copy of the book they had, by use of an encrypted file that would become unusable after the lending period was completed.”
 * Explanation of issue: This sentence seems to imply that there are less protections on the lended books than there are. Internet Archive uses the same technical protections as publishers to protect the books and restrict lending time to two week intervals. This two week time limit also applies to books in the National Emergency Library. There is some concern that this is factually inaccurate by implication.
 * References supporting change: http://blog.archive.org/2020/03/30/internet-archive-responds-why-we-released-the-national-emergency-library/


 * Information to be added or removed: Edit or replace the following sentence “The Archive justified the move as "our unprecedented global and immediate need for access to reading and research material", and that because libraries predated copyright systems, they serve a key public function under critical times.”
 * Explanation of issue: This is not the strongest argument in favor of the legality of the National Emergency Library or controlled digital lending. I believe this affects the neutrality of the piece because of the larger amount of references to criticism following this sentence. I would recommend using the sources compiled by Jill Hurst-Wahl, not affiliated with Internet Archive, in her class lecture notes linked below. I've also linked an expert discussion in a podcast for more information.
 * References supporting change: http://hurstassociates.blogspot.com/2020/04/the-national-emergency-library.html; http://www.byuradio.org/episode/e22ae961-f97e-40c7-babb-b3beb10eaa9b/top-of-mind-with-julie-rose-privacy-and-pandemics-national-emergency-library-tooth-rings?playhead=1192&autoplay=true

Surf314 (talk) 20:05, 23 April 2020 (UTC)

Reply 02-MAY-2020

 * The proposed text to be added to the article is missing. To expedite your request, it would help if you could provide the following items of information:
 * 1) Please state each specific desired change and accompanying reference in the form of verbatim statements which can then be added to the article (if approved) by the reviewer.
 * 2) The exact location where the desired claims are to be placed should be given.
 * 3) Exact, verbatim descriptions of any text and/or references to be removed should also be given.
 * 4) Reasons should be provided for each change.
 * In the section of text below titled Sample edit request, the four required items are shown as an example:

 1. Please remove the third sentence from the second paragraph of the Sun section:
 * "The Sun's diameter is estimated to be approximately 25 miles in length."

2. Please add the following claim as the third sentence of the second paragraph of the Sun section:
 * "The Sun's diameter is estimated to be approximately 864,337 miles in length."

3. Using as the reference:

4. Reason for change being made:
 * "The previously given diameter was incorrect."

Regards, Spintendo  16:56, 2 May 2020 (UTC)
 * Kindly open a new edit request at your earliest convenience when ready to proceed with all four items from your request. Thank you!

University presses copying from the Internet Archive?
I recently discovered that an e-book being sold by Cornell University Press, namely Induction and Hypothesis: A Study of the Logic of Confirmation by Stephen F. Barker (1957), was copied from one of the Internet Archive's (IA) digitized copies, namely. The ebook that is being sold by Cornell University Press has the IA watermark and URL in it, and is exactly the same as the IA copy in all other respects, so it's obvious that it was copied from the IA. I found this to be quite curious and ironic: While some publishers are suing the IA for allegedly violating copyrights, at least one publisher copied one of the IA's PDFs and is selling it! This suggests to me that the relationship between university presses and the Internet Archive should be mentioned in this Wikipedia article.

There are a few posts on the IA blog that mention the relationship between it and university presses, including Cornell University Press, but unfortunately I haven't been able to find a source that mentions that IA-digitized books are being sold as e-books by the original publishers:



That last blog post above even mentions Wikipedia as a justification for the IA's digitization of university press books: "University press books are evergreen, well-cited in Wikipedia, and are the foundations of much scholarship." Biogeographist (talk) 15:35, 27 June 2020 (UTC)


 * For all purposes, that claim is original research - we cannot call out what may seem to be illegal or questionable activities like this. --M asem (t) 15:53, 27 June 2020 (UTC)


 * I'm not saying it's illegal or questionable that Cornell University Press is copying their own book from the IA and selling it as an e-book; the blog posts cited above indicate that there is an explicit agreement between university presses and the IA (but the details are not fully explained in the blog posts). It's that relationship that I'm thinking about adding to the article. Here's a secondary source that mentions the general relationship between some university presses and the IA, for example, from the reputable Publishers Weekly: Biogeographist (talk) 16:18, 27 June 2020 (UTC)
 * Oh, that part is completely reasonable, yes. That's fair to add, reuse of the IA by others. --M asem (t) 16:23, 27 June 2020 (UTC)

I have added information to the article about the IA and university presses. If anyone finds a reliable secondary source that has further information about what is going on with Cornell University Press independently selling/distributing an IA-digitized book, please mention it here, as I would like to point out in the article that two-way relationship with university presses (i.e., that university presses are benefiting from IA's digitization efforts independently of IA's book lending program). Biogeographist (talk) 17:40, 27 June 2020 (UTC)
 * I think the Internet Archive has written in various places that they have programs where the university presses provide materials (in physical form) and copyright licenses (if necessary) in return for the ability to use the scans. Nemo 22:53, 27 June 2020 (UTC)

Unverifiable list of digitizing sponsors for books
The table of book-digitization sponsors that formerly appeared in the article has been pasted below because it is based on an apparently unverifiable source. If the table is created again, it should be created using a verifiable source and more up-to-date numbers. The text below is from the article. Biogeographist (talk) 18:21, 4 December 2020 (UTC)

As of December 2018, over 50 sponsors helped the Internet Archive provide over 5 million scanned books (text items). Of these, over 2 million were scanned by Internet Archive itself, funded either by itself or by MSN, the University of Toronto or the Internet Archive's founder's Kahle/Austin Foundation.

The collections for scanning centers often include also digitisations sponsored by their partners, for instance the University of Toronto performed scans supported by other Canadian libraries.

archive.org redirect
There's a redirect-confused that says "archive.org redirects here". But currently archive.org redirects to Wayback Machine. But again, it should redirect here because the Internet Archive is much more than just the websites archive. Wikipedians figure it out.--95.208.211.114 (talk) 12:49, 29 April 2021 (UTC)
 * A drive-by IP edit changed the redirect about a week ago. I've restored it. Mind  matrix  15:25, 29 April 2021 (UTC)