The standard ASCII table of characters will support the standard a-z with upper and lower case variations supported. There will be other characters that are normally found above the numbers on an English QWERTY keyboard. The ASCII table has a set of extended characters to support additional languages that typically contain accents above the letters to alter the way the letter is pronounced. These extended characters may not be supported in some systems or you may just want to remove them and replace them with similar letters from the standard ASCII table. The tool below will do this for you online. Simply provide the text and it will convert it for you. See the code examples before on how to add this to your own application.
Replacing Characters With Javascript
If you want to replace extended characters and remove letters with accents from a string using Javascript, the code snippet below will allow you to do this very easily. This is the code that the free online tool above is using, so if this tool isn’t working, I am definitely going to look like an idiot!
extendedArray = [ ['Š','S'], ['š','s'], ['Ž','Z'], ['ž','z'], ['À','A'], ['Á','A'], ['Â','A'], ['Ã','A'], ['Ä','A'], ['Å','A'], ['Æ','A'], ['Ç','C'], ['È','E'], ['É','E'], ['Ê','E'], ['Ë','E'], ['Ì','I'], ['Í','I'], ['Î','I'], ['Ï','I'], ['Ñ','N'], ['Ò','O'], ['Ó','O'], ['Ô','O'], ['Õ','O'], ['Ö','O'], ['Ø','O'], ['Ù','U'], ['Ú','U'], ['Û','U'], ['Ü','U'], ['Ý','Y'], ['Þ','B'], ['ß','Ss'], ['à','a'], ['á','a'], ['â','a'], ['ã','a'], ['ä','a'], ['å','a'], ['æ','a'], ['ç','c'], ['è','e'], ['é','e'], ['ê','e'], ['ë','e'], ['ì','i'], ['í','i'], ['î','i'], ['ï','i'], ['ð','o'], ['ñ','n'], ['ò','o'], ['ó','o'], ['ô','o'], ['õ','o'], ['ö','o'], ['ø','o'], ['ù','u'], ['ú','u'], ['û','u'], ['ý','y'], ['þ','b'], ['ÿ','y'], ['’',"'"], ['”','"'], ['“','"'], ["●","*"]]; function removeExtendedCharacters() { var replaceString = input=document.getElementById("inputText").value; for (var i = 0; i < extendedArray.length; i++) { var extChar = extendedArray[i]; regex = new RegExp(extChar[0], "g"); replaceString = replaceString.replace(regex, extChar[1]); } document.getElementById("outputText").value = replaceString; }
Replace Extended ASCII Chars Using PHP
To do this via PHP, you can use the method below. This will use a regex call to replace all of the extended characters with the basic characters that look similar. This will often cover languages other than English that contain accents and other variations of letters from the English alphabet. For example, German letters that contain the two dots above them (umlauts) will be replaced with the equivalent letter that does not contain this character.
function RemoveSpecialChars($text) { $unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', '’'=>"'", '”' => '"', '“' => '"', "●" => "*" ); $swapped = strtr( $text, $unwanted_array ); return preg_replace('/[[:^print:]]/', '', $swapped); }
Additional Languages
If you have snippet for another language that you wish to share, please leave a comment below and I can add it to the table to help others out.
Thanks for: https://yomotherboard.com/replace-extended-ascii-characters-with-standard-characters/ However, it did not work for my application. Specifically,
The “’” character (ASCII 146, Right single quotation mark) must be replaced with “’” (ASCII 39, Single quote.)
The “–” character (ASCII 150, En dash) must be replaced with “-” (ASCII 45, Hyphen-minus.)
but that did not happen. I’m pasting text from the Windows clipboard into an ancient DOS program and extended characters will cause that to fail.