Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Introducing the Agent Toolkit for Amazon Web Services
    • Disabled kea invents jousting to become alpha
    • The Electric Ferrari Luce Is Finally Here
    • X says it is cracking down on large accounts that have been gaming its revenue-sharing program by “programmatically reuploading content from smaller accounts” (Lakshmi Varanasi/Business Insider)
    • Today’s NYT Wordle Hints, Answer and Help for May 26 #1802
    • IEEE TryEngineering OnCampus Now At 7 Universities
    • Can AI write your code? | Towards Data Science
    • Penguin-inspired material offers adaptable heating and cooling
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, May 26
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»How to Develop a Bilingual Voice Assistant
    Artificial Intelligence

    How to Develop a Bilingual Voice Assistant

    Editor Times FeaturedBy Editor Times FeaturedAugust 31, 2025No Comments9 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    , and Siri are the ever present voice assistants that serve a lot of the web related inhabitants right this moment. For probably the most half, English is the dominant language used with these voice assistants. Nevertheless, for a voice assistant to be actually useful, it should be capable of perceive the consumer as they naturally communicate. In lots of components of the world, particularly in a various nation like India, it is not uncommon for folks to be multilingual and to modify between a number of languages in a single dialog. A really good assistant ought to be capable of deal with this.

    Google Assistant affords the power so as to add a second language; however its performance is proscribed to sure gadgets solely and affords this just for a restricted set of main languages. For instance, Google’s Nest Hub doesn’t but help bilingual capabilities for Tamil, a language spoken by over 80 million folks. Alexa helps bilingual strategy so long as it’s supported in its inside language pair; once more this solely helps a restricted set of main languages. Siri doesn’t have bilingual functionality and permits just one language at a time.

    On this article I’ll focus on the strategy taken to allow my Voice Assistant to have a bilingual functionality with English and Tamil because the languages. Utilizing this strategy, the voice assistant will be capable of mechanically detect the language an individual is talking by analyzing the audio immediately. Through the use of a “confidence rating”-based algorithm, the system will decide if English or Tamil is spoken and reply within the corresponding language.

    Strategy to Bilingual Functionality

    To make the assistant perceive each English and Tamil, there are just a few potential options. The primary strategy could be to coach a customized Machine Studying mannequin from scratch, particularly on Tamil language information, after which combine that mannequin into the Raspberry Pi. Whereas this is able to supply a excessive diploma of customization, it’s an extremely time-consuming and resource-intensive course of. Coaching a mannequin requires a large dataset and vital computational energy. Moreover, operating a heavy customized mannequin would possible decelerate the Raspberry Pi, resulting in a poor consumer expertise.

    fastText Strategy

    A extra sensible resolution is to make use of an present, pre-trained mannequin that’s already optimized for a selected job. For language identification, an important choice is fastText.

    fastText is an open-source library from Fb AI Analysis designed for environment friendly textual content classification and phrase illustration. It comes with pre-trained fashions that may rapidly and precisely establish the language of a given piece of textual content from numerous languages. As a result of it’s light-weight and extremely optimized, it is a wonderful alternative for operating on a resource-constrained gadget like a Raspberry Pi with out inflicting vital efficiency points. The plan, due to this fact, was to make use of fastText to categorise the consumer’s spoken language.

    To make use of fastText, you obtain the corresponding mannequin (lid.176.bin) and retailer it in your mission folder. Specify this because the MODEL_PATH and cargo the mannequin.

    import fastText
    import speech_recognition as sr
    import fasttext
    
    # --- Configuration ---
    MODEL_PATH = "./lid.176.bin" # That is the mannequin file you downloaded and unzipped
    
    # --- Major Utility Logic ---
    print("Loading fastText language identification mannequin...")
    attempt:
        # Load the pre-trained mannequin
        mannequin = fasttext.load_model(MODEL_PATH)
    besides Exception as e:
        print(f"FATAL ERROR: Couldn't load the fastText mannequin. Error: {e}")
        exit()

    The subsequent step could be to go the voice instructions, as recordings, to the mannequin and get the prediction again. This may be achieved via a devoted operate.

    def identify_language(textual content, mannequin):
        # The mannequin.predict() operate returns a tuple of labels and possibilities
        predictions = mannequin.predict(textual content, ok=1)
        language_code = predictions[0][0] # e.g., '__label__en'
        return language_code
    
    attempt:
        with microphone as supply:
            recognizer.adjust_for_ambient_noise(supply, length=1)
            print("nPlease communicate now...")
            audio = recognizer.pay attention(supply, phrase_time_limit=8)
    
        print("Transcribing audio...")
        # Get a tough transcription with out specifying a language
        transcription = recognizer.recognize_google(audio)
        print(f"Heard: "{transcription}"")
    
        # Establish the language from the transcribed textual content
        language = identify_language(transcription, mannequin)
    
        if language == '__label__en':
            print("n---> End result: The detected language is English. <---")
        elif language == '__label__ta':
            print("n---> End result: The detected language is Tamil. <---")
        else:
            print(f"n---> End result: Detected a special language: {language}")
    
    besides sr.UnknownValueError:
        print("Couldn't perceive the audio.")
    besides sr.RequestError as e:
        print(f"Speech recognition service error; {e}")
    besides Exception as e:
        print(f"An surprising error occurred: {e}")

    The code block above follows a easy path. It makes use of the recognizer.recognize_google(audio) operate to transcribe the voice command after which passes this transcription to the fastText mannequin to get a prediction on the language. If the prediction is “__label__en” then English has been detected and if prediction is “__label_ta” then Tamil has been detected.

    This strategy led to poor predictions although. The issue is that speech_recognition library defaults to English. So once I communicate one thing in Tamil, it finds the closest (and incorrect) equal sounding phrases in English and passes it to fastText.

    For instance once I mentioned “En Peyar enna” (What’s my Title in Tamil), speech_recognition understood it as “Empire NA” and therefore fastText predicted the language as English. To beat this, I can hardcode the speech_recognition operate to detect solely Tamil. However this is able to defeat the thought of being actually ‘good’ and ‘bilingual’. The assistant ought to be capable of detect the language primarily based on what’s spoken; not primarily based on what is difficult coded.

    Photograph by Siora Photography on Unsplash

    The ‘Confidence Rating’ methodology

    What we want is a extra direct and data-driven methodology. The answer lies inside a characteristic of the speech_recognition library. The recognizer.recognize_google() operate is the Google Speech Recognition API and it will probably transcribe audio from an enormous variety of languages, together with each English and Tamil. A key characteristic of this API is that for each transcription it supplies, it will probably additionally return a confidence rating — a numerical worth between 0 and 1, indicating how sure it’s that its transcription is right.

    This characteristic permits for a way more elegant and dynamic strategy to language identification. Let’s check out the code.

    def recognize_with_confidence(recognizer, audio_data):
        
        tamil_text = None
        tamil_confidence = 0.0
        english_text = None
        english_confidence = 0.0
    
        # 1. Try to acknowledge as Tamil and get confidence
        attempt:
            print("Making an attempt to transcribe as Tamil...")
            # show_all=True returns a dictionary with transcription alternate options
            response_tamil = recognizer.recognize_google(audio_data, language='ta-IN', show_all=True)
            # We solely have a look at the highest various
            if response_tamil and 'various' in response_tamil:
                top_alternative = response_tamil['alternative'][0]
                tamil_text = top_alternative['transcript']
                if 'confidence' in top_alternative:
                    tamil_confidence = top_alternative['confidence']
                else:
                    tamil_confidence = 0.8 # Assign a default excessive confidence if not supplied
        besides sr.UnknownValueError:
            print("Couldn't perceive audio as Tamil.")
        besides sr.RequestError as e:
            print(f"Tamil recognition service error; {e}")
    
        # 2. Try to acknowledge as English and get confidence
        attempt:
            print("Making an attempt to transcribe as English...")
            response_english = recognizer.recognize_google(audio_data, language='en-US', show_all=True)
            if response_english and 'various' in response_english:
                top_alternative = response_english['alternative'][0]
                english_text = top_alternative['transcript']
                if 'confidence' in top_alternative:
                    english_confidence = top_alternative['confidence']
                else:
                    english_confidence = 0.8 # Assign a default excessive confidence
        besides sr.UnknownValueError:
            print("Couldn't perceive audio as English.")
        besides sr.RequestError as e:
            print(f"English recognition service error; {e}")
    
        # 3. Evaluate confidence scores and return the winner
        print(f"nConfidence Scores -> Tamil: {tamil_confidence:.2f}, English: {english_confidence:.2f}")
        if tamil_confidence > english_confidence:
            return tamil_text, "Tamil"
        elif english_confidence > tamil_confidence:
            return english_text, "English"
        else:
            # If scores are equal (or each zero), return neither
            return None, None

    The logic on this code block is straightforward. We go the audio to the recognize_google() operate and get the entire listing of alternate options and its scores. First we attempt the language as Tamil and get the corresponding confidence rating. Then we attempt the identical audio as English and get the corresponding confidence rating from the API. As soon as we have now each, we then evaluate the boldness scores and select the one with the upper rating because the language detected by the system.

    Under is the output of the operate once I communicate in English and once I communicate in Tamil.

    Screenshot from Visible Studio output (Tamil). Picture owned by creator.
    Screenshot from Visible Studio output (English). Picture owned by creator.

    The outcomes above present how the code is ready to perceive the language spoken dynamically, primarily based on the boldness rating.

    Placing all of it collectively — The Bilingual Assistant

    The ultimate step could be to combine this strategy into the code for the Raspberry Pi primarily based Voice assistant. The complete code may be present in my GitHub. As soon as built-in the following step could be to check the functioning of the Voice Assistant by talking in English and Tamil and seeing the way it responds for every language. The recordings under exhibit the working of the Bilingual Voice Assistant when requested a query in English and in Tamil.

    Conclusion

    On this article, we have now seen the way to efficiently improve a easy voice assistant into a really bilingual instrument. By implementing a “confidence rating” algorithm, the system may be made to find out whether or not a command is spoken in English or Tamil, permitting it to grasp and reply within the consumer’s chosen language for that particular question. This creates a extra pure and seamless conversational expertise.

    The important thing benefit of this methodology is its reliability and scalability. Whereas this mission centered on simply two languages, the identical confidence rating logic might simply be prolonged to help three, 4, or extra by merely including an API name for every new language and evaluating all the outcomes. The methods explored right here function a strong basis for creating extra superior and intuitive private AI instruments.

    Reference:

    [1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification

    [2] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Introducing the Agent Toolkit for Amazon Web Services

    May 25, 2026

    Can AI write your code? | Towards Data Science

    May 25, 2026

    I Built My First ETL Pipeline as a Complete Beginner. Here’s How.

    May 25, 2026

    From TF-IDF to Transformers: Implementing Four Generations of Semantic Search

    May 25, 2026

    The Ultimate Beginners’ Guide to Building an AI Agent in Python

    May 24, 2026

    Beyond the Model: Why Data Scientists Must Embrace APIs and API Documentation

    May 24, 2026

    Comments are closed.

    Editors Picks

    Introducing the Agent Toolkit for Amazon Web Services

    May 25, 2026

    Disabled kea invents jousting to become alpha

    May 25, 2026

    The Electric Ferrari Luce Is Finally Here

    May 25, 2026

    X says it is cracking down on large accounts that have been gaming its revenue-sharing program by “programmatically reuploading content from smaller accounts” (Lakshmi Varanasi/Business Insider)

    May 25, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    It’s So Easy to Install This Window Heat Pump I Saw at CES

    February 2, 2025

    China Cracks Down: Fake Experts Banned From Social Media

    October 30, 2025

    Let the AI Do the Experimenting

    April 28, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.