Preserve old public domain newspapers and books from archive websites

I know very well how fragile the web is. The internet is «just» computers and cables, and I think that most of us have the experience of losing our personal data. Yesterday everything was working — I could watch that YouTube video, my media existed on our company website, and the library had that document online. But today, none of it works. Victor Frankl told us to try to find a meaning in loss and struggles. I think the meaning here is: save now what we can. We live in a burning library — losing data every day, losing our culture and history. Its painful. It leads to bad results for humanity — less data for analysis. We have many wonderful people who understand and feel the same way — I want to say thank you, everyone who has contributed to Wikipedia, scanned some papers in a library, uploaders to Wikimedia Commons, archive.org, ArchiveTeam, game preservationists in many forms (one of them is lutris.net) and even legal torrents, open-source software developers, media creators who are using Wikimedia Commons compatible license. It is more important nowadays — the wars, current and new ones, will destroy archives too. Thank you — your work, your volunteering have meaning. In my offline lectures, talks with people, in some articles/posts — I try to draw attention to an ongoing problem — the gradual disappearance of media, every day: newspapers, books, games («the source code is lost» — a phrase I have heard many times from developers), music, podcasts, photos, articles, films. Yesterday I uploaded scans from moneymuseum.by to Commons — but today this website is working only from Belarus. Will we be able to open Russian, Belarusian, Ukrainian, Iranian websites of museums, archives, galleries tomorrow? Latvian? Greenland? I feel that this is my mission — to preserve data and help others do the same.

From https://commons.wikimedia.org/wiki/File:Day_1_CEE_Meeting_2025_48.jpg, CC BY-SA 4.0 — *From* *https://commons.wikimedia.org/wiki/File:Day_1_CEE_Meeting_2025_48.jpg, CC BY-SA 4.0*

Once, I attended a book presentation. The speaker said that the original text had been lost, so this version was a translation of a translation. The book was from the 20th century. I’m sure you know a few examples like that as well.

I know a few media companies that had many years of media content on their website — and now it is lost. The Wayback Machine does not preserve all audio or video (but still many thousands of YouTube videos/day). And there is no content search on Wayback Machine for every website (only for some collections). And it does not preserve all texts — I was unable to find at least one article that I had a URL for, and that article is important for me.

In 2011 I also had some feeling that we need to preserve the culture. I was part of the theater, and I knew that we needed to film the performances, on multiple cameras. Good point — now that video is the only thing that remains from that performances, plus photos. Theater actors are dying too, and sometimes before 40.

At the same time, we have sooo many data already — you cannot watch all anime, you cannot listen to all podcasts, music, read all books that were published even in your small European country. On Steam in 2024 were published 19000 games, its 52 games per day. Still — the data is precious, any media created by people is an imprint of their souls, of the current Zeitgeist. Other people and AI will read and play all of it, after hundreds of years, and gain insights about us. Or maybe not other people but we — perhaps we will find the way to disable aging, improve memory and cognitive capacity, and we will have a lot of time and mentality to read and understand a lot.

The stand-alone expansion Homeworld: Cataclysm was not announced for a remake, despite the outspoken interest of Gearbox, as they were unable to find the original source code

From https://en.wikipedia.org/wiki/Homeworld#Remaster

Rosado claimed «the art challenge alone felt insurmountable early on» due to the scale of remastering three games simultaneously, particularly the immensity of San Andreas. The team avoided making the games look too realistic as they felt the characters—whose motion capture data was coded to the original caricaturised wire-frame models—would look out of place. They wanted the characters to maintain their original appearance, noting the game «must look like you remember it». They faced difficulty when working on characters, as they felt adding detail «where there was no detail before» might conflict with the player’s «mental image» of the character designs; the team consulted developers at Rockstar North, including some of the games original artists. Much of the material from the original games—such as the source audio, textures, reference material, and character models—were unable to be found due to a lack of archives, as the original development team «never thought [they would] have to revisit these projects».

From https://en.wikipedia.org/wiki/Grand_Theft_Auto:The_Trilogy–_The_Definitive_Edition#Development

On January 29, 2015, about 15 years after the original release of Heroes of Might & Magic III, Ubisoft released a new high-definition version of the game compatible with PCs as well as Android and iOS tablets. The expansion packs were not included because the source code for those releases was lost

From https://en.wikipedia.org/wiki/Heroes_of_Might_and_Magic_III#HD_Edition_2

Sometimes I think about the Equilibrium movie. I was 12 when it was released. Somehow, that scene was important for me — when in the beginning, after fights, the main character found the target — art artefacts, so in the beginning of the movie, they show us — this movie is about art, history, and how important it is. It is a movie about the future. The main character gave a look into the Mona Lisa, another guy checked it with some futuristic device, confirming that this is original, and after a few seconds of looking — the main character said «Burn it«. Their mission was to find and destroy such artifacts of culture. As in that meme: «Nobody understands the bond between a boy and the obscure movie he watched when he was younger». Wow, Equilibrium is 33 out of 100 on Metacritic… Sometimes something important for one person is less important for another — sometimes I think about it when answering «Are we sure that we need to preserve this?».

Its a screenshot. But right now, YouTube is deleting some video

Ok, back to the Wikimedia movement. Turns out, I already uploaded more than 133k files, most of them — newspaper from Russian Empire (before 1917, to be sure that they are already in the public domain). Newspapers of Belarus, Georgia country, Russia. I found many scans on genealogist forums, where people are looking for their relatives. They pay money for scans, and upload them to Google Drive — and many such Google Drive links are already 404.

I remember my first uploads to Wikimedia Commons — it was difficult, frustrating, time consuming — this is not drag-and-drop. Multiple times my first images were removed, and I tried to understand why and how it works. Now I am doing offline meetups about Wikimedia Commons, Wikidata, Wikipedia — helping other people to save the data, talking about notability, references, licenses, open media formats, permissions.

I did a few userscripts and userstyles for the official Upload Wizard. It is great — it increased my performance and satisfaction from the process. I love the web for that also — it is more difficult to patch software but easier to «patch» the CSS or JavaScript of a web page, to improve something.

2025 became special for me — because I created a few dream tools about uploading to Wikimedia Commons. I tried a few upload tools before, the one I liked the most — DtMediaWiki — plugin for Darktable photo manager (thanks to the creator Benoit Brummer). This is the obvious idea — upload to Wikimedia Commons (as to other online storages) from a photo manager. I had some success with it, even contributed to this plugin a little. But — I had some errors, tried to investigate them — but something was broken for me.

And then I built three wonderful upload tools:

The script for gThumb photo manager: finally I got this magic — select photos that I want to upload, set categories and licenses, right here, and click one button (or a hotkey) — and then relaxing progress bar. Wonderful. Can the uploading experience be better?
I love browser extensions. I built a few of them, for different tasks. Finally — again something that I dreamed about — the uploader to Wikimedia Commons. Right click on an image — click Upload — and fill in a few fields like name, description, license, categories. Wonderful.
And my champion — stateless CLI, around official Pywikibot. Using this tool I uploaded more than 100k files (images, PDFs). I am proud of it. It easy — just specify the license, category, maybe filename prefix — and press Enter.

Screenshot from https://gitlab.com/vitaly-zdanevich-extensions/uploading-to-wikimedia-commons, license MIT — *Screenshot from* *https://gitlab.com/vitaly-zdanevich-extensions/uploading-to-wikimedia-commons, license MIT*

They are free and open source, feel free to use them and report issues, feature requests.

Originally I am from Minsk (Belarus), relocated to Georgia country in 2022. Here I found the official dspace instance of the National Library of Georgia. And preserved many 19th-century newspapers to Wikimedia Commons. You can also do something like that, while that website still exists.

Screenshot of https://dspace.nplg.gov.ge/simple-search?filterquery=%5B1700+TO+1799%5D&filtername=dateIssued&filtertype=equals&envfv=ex20251112 — *Screenshot of* *https://dspace.nplg.gov.ge/simple-search?filterquery=%5B1700+TO+1799%5D&filtername=dateIssued&filtertype=equals&envfv=ex20251112*

Again, I want to highlight: if you see some media online — think that we can lose it. Because of legal rights and notability — we cannot upload everything to Wikimedia Commons, but even old public domain books, newspapers — not all of them are preserved. The national online archives of some countries — sometimes these services depend on fragile infrastructure and limited budgets, and can disappear quickly.

I was shocked to know that some commercial companies spent millions of dollars developing some software, that was not even released to public — and one day they lost it, for example because of a hacker attack. «The source code is lost» — I heard it many times, and I was shocked — how is that possible, for an IT company, that knows about backups.

If you know a photographer or videographers — please consider speaking with them about free license and uploading to Wikimedia Commons, saying that every page will have their author name. I did it multiple times — and uploaded tens of thousands of images from such people, using my tools.

Also that I practise — if you watch something interesting on YouTube or listen to a podcast about a topic that exists on Wikipedia — think about that you can write to the authors and tell them about the Creative Commons license, and if they agree to relicense — you can embed their media into Wikipedia articles, of cource remember that if their video includes media content of others — you can embed audio only, again if there is no copyrighted music in it (or you can cut it). You can download opus audio from YouTube using a third-party tool — its an open good modern audio format. I did it multiple times — nice that Wikipedia is not only text and images — we can listen to it too.

And I also want to give you some advice about your private family albums — think about scanning them too, maybe not for Wikimedia Commons — but for you, for your personal hard drive or a cloud. In 2025-2026, many families in Los Angeles wildfires lost irreplaceable photo albums.

If you need some help about preservation — feel free to write to me at https://t.me/vitaly_zdanevich

This post is also published on https://diff.wikimedia.org/2026/04/09/preserve-old-public-domain-newspapers-and-books-from-archive-websites/

ссылка на оригинал статьи https://habr.com/ru/articles/1025750/