Text Tools

How can I remove whitespaces inside words from a German text file?

April 19, 2025

Asked By WhimsicalDingo99 On April 19, 2025

Hey everyone! I'm dealing with a German text file that has some annoying whitespaces inserted between characters in words, like this: "i n t e r e s t i n g". I'd like to find a way to remove those whitespaces from within the words without affecting the spaces between words. My initial thought was to create a large text file with all possible German words in that spaced format to replace them, but that seems a bit cumbersome. Does anyone have a more effective or elegant solution for cleaning up this text? Also, I'm curious—why do some questions like this tend to get downvoted?

5 Answers

Answered By HonestChipmunk91 On April 23, 2025

Answered By SillyPineapple42 On April 23, 2025

You could use a regex pattern like `([a-z]s){5,}` in a text editor or script. Just replace '5' with your threshold for how many letters can be spaced. This method should help you pinpoint those spaced letters pretty effectively!

CuriousBee82 - April 23, 2025

Good point! German does have its quirks with letters like 'I' and 'a' that can trip you up in English. But at least you don't have to deal with single-letter words messing up the cleaning process!

TechWiz76 - April 23, 2025

I tried a similar approach using `egrep` to identify bad words, but manually replacing them was such a hassle. Automating it would be much better!

Answered By PracticalPanda83 On April 23, 2025

Just check if the whitespaces are actually common spaces or something like non-breaking spaces. If they're just regular spaces, then it should be pretty straightforward to fix.

WittyOtter17 - April 23, 2025

Unfortunately, in this case, they're just plain old spaces!

Answered By CodeNinja57 On April 22, 2025

Using bash may not be the best route, especially if this isn't a regular task for you. You might consider using a locally hosted language model (LLM) for parsing the text, as it can handle the formatting without adding spaces mistakenly. But be careful with this approach—it can be easy to waste time fiddling with bash scripts instead of using a more efficient method!

FrustratedCoder45 - April 23, 2025

I totally agree! Using a LLM could add unnecessary complexity, especially if it sometimes generates inaccurate results.

Answered By CleverFox34 On April 22, 2025

Here's a bash script that may help:
```bash
#! /usr/bin/bash

function gen_sed_cmd {
mapfile -t bwords < <(grep -oP 'b(([A-Z]s)?([a-z]s|-s)+([a-z])?+)b' < "${G_FILE}")
mapfile -t gwords < <(grep -oP 'b(([A-Z]s)?([a-z]s|-s)+([a-z])?+)b' < "${G_FILE}" | tr -d ' ')

local buf=()
local cwords="${#bwords[@]}"
for ((i=0;i<=cwords;i++)); do
buf+=("s/${bwords[i]}/${gwords[i]}/")
done

IFS=';' buf="${buf[*]}"

printf '%sn' "${buf}"
}

function main {
readonly G_FILE="${1}"
shift 1

if [ -z "${G_FILE}" ] || [ ! -f "${G_FILE}" ]; then
printf 'usage: fix_words.sh [comp-file]'
exit 1
fi

read -r sed_cmd < <(gen_sed_cmd "${G_FILE}")

if [ -n "${1}" ]; then
echo "Diff output - file"
diff --color=always --text <(sed "${sed_cmd}" "${G_FILE}") "${1}"
else
echo "Edit in place"
sed -i "${sed_cmd}" "${G_FILE}"
fi
}

main "${@}"
```
This should solve most of your issues! Let me know if it works for you.

How can I remove whitespaces inside words from a German text file?

5 Answers

Related Questions

Convert CSV To HTML Table

Flip Text Upside Down - Free Online Tool

Docx To PDF

Anthropic Claude AI Token Calculator

List Sorting Tool

AI Content Detector

LEAVE A REPLY Cancel reply