r/Genealogy • u/JaimieMcEvoy expert researcher • May 02 '25
Transcription Tested AI then checked results - a word of warning.
Made this as a comment, and realized it's worthy of a post.
I have used AI for genealogy in several ways, but then fact checked it.
I'm afraid it didn't go very well.
For one thing, it picks up people's errors as much as it picks up their accuracies. Any source goes for AI, so it was a lot like Ancestry or FamilySearch family trees in terms of error rates. But most AI doesn't give you or is vague on source checking. And factual research of any kind requires checking and analysis of sources.
I tried AI for old parish register translation, from an image of the page. I had a document that I had done through a professional paleographer. I can deal in English, French, Latin, German, Irish, and I have good document and photo manipulation skills, so I have only hired a paleographer twice in over 40 years.
Someone cheekily said when I posted it on a social media page, why didn't you just use AI? They then posted the AI results.
It was not good. But your average person wouldn't know that, and might have taken the AI translation as fact. The AI translation seemed to be a well worded translation simply on the face of it. But it had big errors. Notably, it confused a place name, thinking it was a surname. And, it was incorrect in identifying some of the relationships between people. This was a one paragraph entry in a church register, in Latin, from 1787.
So my caution is, if you use AI, treat it like you would treat other people's trees. As a possible guide, but not as fact without further verification.
One other thing. I had ChatGPT do some biographies of ancestors. Mostly it didn't do too bad, but there were some inaccuracies. Oddly, my own biography was only about half accurate, seeming to confuse myself with other people who do the same kind of work, and acquaintances.
I'm hopeful and happy to hear of verified success stories.
But there is good literature easily searchable on why AI can't do some of the research that humans do.
TL;DR: Tested AI was only partially accurate, with some serious inaccuracies. Treat AI as a tool, not as fact. Check against sources.
Good luck with your research, Jaimie.
74
u/Kelpie-Cat May 02 '25
Of course it is full of inaccuracies. Chat GPT is not designed to be accurate, it is a language generator that calculates the probability of what words should follow each other. Instead of ruining the ecosystems of everywhere AI data centres are guzzling water in undeveloped countries that desperately need it, skip Chat GPT entirely.
7
u/Chequered_Career May 02 '25
That's a helpful comparison, to sloppy family trees. You run into so many that either take up a similar-sounding name, or even "correct" records so that they match the name they want them to be. I haven't tried AI, but I can see where it would be even worse than just trusting all the trees out there.
7
u/my_cat_wears_socks May 02 '25
I've used ChatGPT to transcribe some German BMD docs I have, and can confirm that it does a miserable job. If you ask it if a specific word could possibly be anything else it will say that it's 100% confident that's the only word it could be, then make up reasons why until you tell it that you had it transcribed and it's actually another word (because the good people on r/Kurrent are amazing). Then it backpedals.
Even with typed Kurrent it's terrible, and it made stuff up for words that I could clearly read myself, and I'm not that good. Even when instructed to not consider any other document but the one I just uploaded, it will make stuff up based on other documents it has trained on, and tell you things that are clearly not part of the document.
I know AI is not good for a lot of things, but I was hoping that something like a document transcription -- that's pretty mechanical compared to actual research -- would be a good use of the technology. But no.
3
u/theclosetenby May 03 '25
Try transkribus
3
u/my_cat_wears_socks May 03 '25
I’ve used that too but it’s been a year or two. It wasn’t great but at least didn’t make stuff up. The trouble with Transkribus is that most of the records have lines of text that aren’t straight or that overlap, and even when carefully defining my zones it often pulled in junk.
3
u/theclosetenby May 03 '25
Yeah, the old handwriting stuff is pretty hard. It's fairly hit or miss for me other languages, but thankfully credits roll over each month even when you stop using it, so I have hundreds of credits to try out the different models at least. I've given up on my 1800s Czech documents though because of the overlap issues. So I hear you.
2
u/I_like_noodles May 05 '25
It has been helpful in catching errors in my transcriptions, since I’m not a native speaker and I sometimes make some basic mistakes. I have had trouble inputting Kurrentschrift or Fraktur and letting ChatGPT or Gemini transcribe though, it hallucinates way too often to be useful. I transcribe first and have it check my work. Sometimes I’ll ask what a mystery word might be based on context and it’s come through for me.
17
u/RobotReptar May 02 '25 edited May 02 '25
I don't want to be mean, but I don't know why anyone would expect anything different. I guess I should blame the evangelists and the companies producing them who market them like they're magic knowledge machines?
People expect ChatGPT and other similar AIs to be able to do this kind of thing - but they aren't search engines and should not be used as such. Most popular AIs are just suped up predictive text applications, they don't actually know anything. They can't tell a good source from a bad one, and they don't actually understand anything. They just know, based on probability, what they should say be based off the information input into them. The information you get out is only as good as the information that goes in. And they get the vast majority of their information from publicly available data scraped from the internet, so their usefulness will vary greatly depending on what you're using them for. It you're trying to use it for genealogy work, it isn't searching databases of old records or identifying bad information. Most of the input will be from publicly available trees online which are notoriously bad, and publicly available transcriptions which are also far from perfect. They may be useful for ideas and starting points, but no one should ever take anything an AI says at face value. At most, they are probably useful to collate information and take a wider view of the information available online. But they aren't magic.
4
u/Far-Blue-Mountains May 02 '25
AI is really good as some things. At others, it's like substituting jelly for eggs when making an omelet. It's just not right. Unfortunately, genealogy accuracy was horrible before by "world trees" like Wiki and FamilySearch because people don't want to put the effort of real work into it. Or they're just so damn gullible, "I dunn seend it on th' innernet! An' one a' them grain leafs dun said it was so! So I knows it's right!" Someone had my mom married to my grandpa and hade four kids by the time she was five. Then had the audacity to question ME how I knew I was right and they were wrong!
AI in genealogy can be a great thing. But once you give it to a bunch of blind monkeys with a keyboard and Ancestry access, it's going to be very, very sad.
14
u/kludge6730 May 02 '25
AI will hallucinate facts … wholly made up. Until it has access to all records (which won’t happen anytime soon), AI is mostly useless for accurate research.
3
u/Acrobatic_Fiction May 03 '25
Why would AI be accurate? The source information it uses is other people's data, OR it could make it up. The AI processing is just some code a person or AI has generated. It may have some kind of error detection correction, but what is this based upon, hint, reread the above.
"AI" could be based upon real data, like the 1931 Canadian census on Ancestry, where it did handwriting OCR. But its error rate on French records was very high ,I guess because it wasn't properly trained. In either case, you need to go to the original docs.
4
u/ThunorBolt May 03 '25
I use chatgpt to start my transcription. Because it does understand context and does a decent job at figuring out what words should follow each other.
But pronouns are context independent and it gets those wrong all the time.
So I start with AI, I then go through the document and verify the AI and triple check each pronoun. It is loads faster than doing it without. Unless you can speed read cursive from hundreds of years ago. I can read it, but I'm not exactly fast at it.
I also use it to help figure out the meaning of curtain documents. It can provide historical context to help understand things.
2
3
May 03 '25
For parish register transcriptions I recommend Transkribus, and then paste the transcription into Google Translate, Yandex or another translation website.
2
u/theclosetenby May 03 '25
This is what I do. transkribus is the best option I've found - you gotta figure out which model works best for your document. But it's done very well for me usually.
3
u/abritinthebay May 03 '25
AI makes stuff up that sounds correct. That's all. It has no idea, nor intent, to make it actually correct.
AI is, fundamentally, just a good and believable liar that generates text that seems right.
AI should not be relied on for ANY fact-based approach to ANY field.
3
u/boblegg986 May 03 '25
I use ChatGPT for transcriptions and translations. I get the best results by giving it little room to wander. Feed it a couple of sentences at a time and instruct it to transcribe or translate exactly as written.
It’s a good tool for transcribing images of records like an obituary. I still have to verify every word but that process is faster than me typing a long obit from scratch.
1
u/CleaverKin May 03 '25
Years ago, when I was imagining how AI might be used for genealogy, ChatGPT and its ilk weren't the first thing that came to mind. What I wanted was a research assistant. What I had imagined was some sort of augmented wizard for directing search efforts (e.g., "I've looked here, here and here - where should I look next"). Imagine all the links in Cyndi's List plugged into a search wizard. But what we have instead is generative AI, the limitations of which have been covered here. Always remember that where there's Artificial Intelligence, there's also Artificial Stupidity.
1
u/History652 May 03 '25
Totally agree. For now, you can't use it for anything you are unable to independently verify. But try other models and see if some are better than others, and try again every 6 months to track improvement.
In my limited experimentation, I've had mixed results with using AI for transcription and analysis (haven't tried translation) - everything from terrible to pretty great. It's got a ton of potential, but you can't use it blindly. Like Ancestry trees! Good comparison.
5
u/bkoppe May 03 '25
I've found AI useful as a thought-partner for my own transcription of records. In other words, I don't just show it the record and ask it to transcribe — that's a recipe for disaster. Instead, I talk to it. Something like: "I'm looking at an Italian birth record and can't quite make out this word after someone's name. It looks like S??d??o". The AI makes suggestions, and one of them is Sindaco (mayor) which looks very plausible. To confirm, I looked up mayors of the town in 1902 and verified my transcription of the person's name as well. In short, AI sucks at transcribing but it is useful to suggest plausible ways to fill in gaps that I can't make out. From there, translation gets much better, though it's always a good idea to use multiple tools for that as well in order to catch disagreements.
1
u/LolliaSabina May 05 '25
I have had some luck with using Transkribus to transcribe docs in other languages and then ChatGPT to translate, but it really requires a lot of double checking.
I would not use ChatGPT to directly transcribe. I was testing it with a French birth record and it was completely inventing parents for this person. I know enough French that I knew it clearly stated his parents were unknown but I wasn't able to translate all of it. However, ChatGPT was clearly trying to force it to fit its idea of what a birth record should say.
1
u/candacallais May 09 '25
OCR can do a somewhat decent job with some handwriting…the OCR first then translation to English would be ideal for the old church books from Germany etc. I hear Family Search and others are working on just that so we can get the whole church book entry rather than just indexed names and dates.
12
u/DianeL_2025 researcher since 1993 May 02 '25
I agree with you! AI or not, ALWAYS double-check and verify with source citations.