Hi everyone! I’ve been working on a major upgrade to our archive of the Microsoft Download Center.
We initially set up our Download Center archive in 2023 to provide an index of data scraped in August 2020 by ArchiveTeam, a community effort that preserves information at risk of being deleted. Microsoft decided to delete thousands of downloads on the flimsy basis that they were signed using the insecure SHA1 algorithm, providing less than a week of notice through an obscure forum post. Thanks to the power of the Wayback Machine, and ArchiveTeam’s efforts, we were able to provide a comprehensive index of all downloads that existed in the days before the purge. This particularly included downloads for Windows XP, Vista, and 7, Office 2003, 2007, and 2010, and so on.
What I didn’t fully realise at that point is that Microsoft quietly deletes downloads all the time. So I pulled down many thousands more downloads from the Wayback Machine (and then took almost 2 years to do anything with them). The United States Library of Congress had some archives too. This data ranges from 2012 to today, giving me an incredible view of what turned out to be 41,011 downloads, up from the 28,310 in the August 2020 archive. At this point, it was very clear that Microsoft is trigger-happy to get rid of downloads as soon as they deem them out of support.
It makes sense in one way, since Microsoft is no longer offering security updates or customer support services for these downloads, and doesn’t want to make any guarantees on the safety of using this software today. In another way, I’m confident that users are able to make their own judgement on the risk of downloading old software, and I’d way rather they find these downloads from a trusted source such as the Wayback Machine. This is why we run our archive - these downloads didn’t stop being useful just because they’re 10, 15, 20, or 25 years old.
The new archive is a huge expansion, adding many previously popular downloads for Windows 98, Me, and NT 4.0, Office 97 and 2000, and so on. We’ve also added a dedicated page for Windows XP PowerToys and Fun Packs, which includes Tweak UI, TaskSwitch (a more XP styled Alt-Tab replacement), and the 3D Windows XP screen saver.
Bringing all of this data into a consistent format for our database was no small task. The main script I wrote for this ended up at 1,600 lines that are an absolute mess, because of how rough HTML scraping is naturally going to be.
There are still a few remaining unsolved problems, but I’m overall very happy with where everything is right now:
-
In July 2024, Microsoft did a sloppy migration of data in their backend system, which reset the published date on every download to 15 July 2024. In most cases, we’ve corrected these dates, but there are still plenty of downloads that need fixing.
-
A range of download archives from the mid-2010s are missing a list of files.
-
Some downloads lead to archives of a 404 page. I think timestamps on file metadata we pulled from the Wayback Machine might have drifted since I started this archiving effort.
-
There are downloads from even earlier than 2012 that are valuable - I’m planning to add those. They haven’t been forgotten!
-
I know it’s still English-only - sorry about that. As the Internet Archive is based in the US, other languages are poorly archived. If you know of any archive websites similar to the Wayback Machine in your native language, please get in touch.
If you find any issues other than these, let us know and we’ll work on it. Also let me know if you find any gems in the archive that should be added to the “Commonly Downloaded” list!
By the way, you will find that Bing and DuckDuckGo (which uses Bing data) already has most of the new download center archive indexed. You might even be able to search our archive through your Start menu search box. Google has never particularly liked me, so the downloads may still be hard to find on Google. (ChatGPT and Claude also do pretty good at finding these downloads - we see users click through from them very frequently in our logs.)