2

we have a basic user search application. one needs to write partial/full user firstname / lastname and the application returns users matching the search. Now we deal with many international users hence have varied character set from varied languages. We wish to improve our search so that even when users replace a language's special character with its english alphabet counterpart it still returns the value. E.g: We should get Müller when we type both 'Mueller' or 'Muller'. I have done this through combination of setlocale and iconv

    setlocale(LC_ALL,'de_DE.UTF-8');
    $de_stringconv = iconv('UTF-8', 'ASCII//TRANSLIT', $string);
    $de_wordconv = iconv('UTF-8', 'ASCII//TRANSLIT', $word);
    $de_match = strpos(mb_strtolower($de_stringconv), mb_strtolower($de_wordconv)) !== false;
    
    setlocale(LC_ALL, 'en_GB.UTF-8');
    $en_stringconv = iconv('UTF-8', 'ASCII//TRANSLIT', $string);
    $en_wordconv = iconv('UTF-8', 'ASCII//TRANSLIT', $word);
    $en_match = strpos(mb_strtolower($en_stringconv), mb_strtolower($en_wordconv)) !== false;
    
    return $en_match || $de_match;

This code ,although effective on german dutch and english names, doesnt not capture turkish ones specially ones that contain "ı" Is there any way to apply a generalised code that deals with all languages' special characters, instead of setting specific setlocale()

4
  • A few pointers. You might find interesting features in the PHP intl extension. I do wonder if you are not chasing a red herring though. If German/Dutch consider U umlaut equivalent to UE, but some other locale doesn’t, what would be your expected behaviour? Moreover the dotless i seems to be considered an entirely different letter and not a modified i, somewhat like W is not a doubled V. If all locales differ, maybe you can’t really be in all of them at the same time.
    – lampyridae
    Commented Aug 29, 2024 at 12:37
  • Maybe strip accents in a specific locale, then add a bit of tolerance by matching anything with a low enough levenshtein distance?
    – lampyridae
    Commented Aug 29, 2024 at 12:42
  • There's also soundex(), see the "User Contributed Notes" there. Are you actually looking these names up in a database? MySQL, for instance, has flexible search features. For starters, it also has a SOUNDEX().... Commented Aug 29, 2024 at 12:49
  • I agree @lampyridae we have primarily german dutch turkish and english users. is there anything that address special characters from these 3 languages?
    – Amrita Deb
    Commented Sep 12, 2024 at 13:07

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.