Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Universal Audio Volt 876 USB Audio Interface Review: Pro-Level Polish
    • New York City-based Mecka AI, which trains robots with human data sourced from body sensors and iPhones, raised $60M, including a $25M Series A (Ben Weiss/Fortune)
    • Is Instagram Down? What to Know
    • It’s the Lessons We Learned Along the Way. Or, Is It?
    • The forever chemicals impacting your health
    • WiseTech CEO threatened amid job cuts; founder Richard White calls in police
    • Best Sleep Trackers of 2026: Oura, Whoop, and Eight Sleep
    • SpaceX will reserve up to 5% of its Class A shares for select employees and executives’ friends and family; 60%+ of shares have an extended lock-up (Charles Capel/Bloomberg)
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Monday, June 1
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Crafting a Custom Voice Assistant with Perplexity
    Artificial Intelligence

    Crafting a Custom Voice Assistant with Perplexity

    Editor Times FeaturedBy Editor Times FeaturedAugust 31, 2025No Comments10 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    , Alexa, and Siri are the dominating voice assistants accessible for on a regular basis use. These assistants have turn out to be ubiquitous in virtually each residence, finishing up duties from residence automation, be aware taking, recipe steerage and answering easy questions. In relation to answering questions although, within the age of LLMs, getting a concise and context-based reply from these voice assistants could be tough, if not non-existent. For instance, in case you ask Google Assistant how the market is reacting to Jerome Powell’s speech in Jackson Gap on Aug 22, it would merely reply that it doesn’t know the reply and provides a couple of hyperlinks which you can peruse. That’s if in case you have the screen-based Google Assistant.

    Typically you simply need a fast reply on present occasions, otherwise you need to know if an Apple tree would survive the winter in Ohio, and sometimes voice assistants like Google and Siri fall in need of offering a satisfying reply. This received me enthusiastic about constructing my very own voice assistant, one that may give me a easy, single sentence reply primarily based on its search of the net.

    Picture by Aerps.com on Unsplash

    Of the varied LLM powered engines like google accessible, I’ve been an avid consumer of Perplexity for greater than a 12 months now and I take advantage of it completely for all my searches besides for easy ones the place I nonetheless return to Google or Bing. Perplexity, along with its reside internet index, which permits it to supply up-to-date, correct, sourced solutions, permits customers entry to its performance via a robust API. Utilizing this performance and integrating it with a easy Raspberry Pi, I supposed to create a voice assistant that may:

    • Reply to a wake phrase and be able to reply my query
    • Reply my query in a easy, concise sentence
    • Return to passive listening with out promoting my information or giving my pointless advertisements

    The {Hardware} for the Assistant

    Picture by Axel Richter on Unsplash

    To construct our voice assistant, a couple of key {hardware} parts are required. The core of the mission is a Raspberry Pi 5, which serves because the central processor for our software. For the assistant’s audio enter, I selected a easy USB gooseneck microphone. This sort of microphone is omnidirectional, making it efficient at listening to the wake phrase from totally different components of a room, and its plug-and-play nature simplifies the setup. For the assistant’s output, a compact USB-powered speaker offers the audio output. A key benefit of this speaker is that it makes use of a single USB cable for each its energy and audio sign, which minimizes cable litter.

    Block diagram displaying the performance of the customized voice assistant (picture by writer)

    This method of utilizing available USB peripherals makes the {hardware} meeting simple, permitting us to focus our efforts on the software program.

    Getting the atmosphere prepared

    With the intention to question Perplexity utilizing customized queries and with a view to have a wake phrase for the voice assistant, we have to generate a few API keys. With the intention to generate a Perplexity API key one can join a Perplexity account, go to the Settings menu, choose the API tab, and click on “Generate API Key” to create and duplicate their private key to be used in purposes. Entry to API key technology often requires a paid plan or cost technique, so make sure the account is eligible earlier than continuing.

    Platforms that provide wake phrase customization embrace PicoVoice Porcupine, Sensory TrulyHandsfree, and Snowboy, with PicoVoice Porcupine offering a simple on-line console for producing, testing, and deploying customized wake phrases throughout desktop, cellular, and embedded gadgets. A brand new consumer can generate a customized phrase for PicoVoice Porcupine by signing up for a free Picovoice Console account, navigating to the Porcupine web page, deciding on the specified language, typing within the customized wake phrase, and clicking “Practice” to provide and obtain the platform-specific mannequin file (.ppn) to be used. Be certain to check the wake phrase for efficiency earlier than finalizing, as this ensures dependable detection and minimal false positives. The wake phrase I’ve skilled and can use is “Hey Krishna”.

    Coding the Assistant

    The entire Python script for this mission is offered on my GitHub repository. On this part, let’s have a look at the important thing parts of the code to grasp how the assistant features.
    The script is organized into a couple of core features that deal with the assistant’s senses and intelligence, all managed by a central loop.

    Configuration and Initialization

    The primary a part of the script is devoted to setup. It handles loading the required API keys, mannequin recordsdata, and initializing the purchasers for the providers we’ll use.

    # --- 1. Configuration ---
    load_dotenv()
    PICOVOICE_ACCESS_KEY = os.environ.get("PICOVOICE_ACCESS_KEY")
    PERPLEXITY_API_KEY = os.environ.get("PERPLEXITY_API_KEY")
    KEYWORD_PATHS = ["Krishna_raspberry-pi.ppn"] # My wake phrase pat
    MODEL_NAME = "sonar"

    This part makes use of the dotenv library to securely load your secret API keys from a .env file, which is a greatest apply that retains them out of your supply code. It additionally defines key variables like the trail to your customized wake phrase file and the particular Perplexity mannequin we need to question.

    Wake Phrase Detection

    For the assistant to be actually hands-free, it must hear constantly for a particular wake phrase with out utilizing vital system assets. That is dealt with by the whereas True: loop within the fundamental perform, which makes use of the PicoVoice Porcupine engine.

    # That is the principle loop that runs constantly
    whereas True:
        # Learn a small chunk of uncooked audio information from the microphone
        pcm = audio_stream.learn(porcupine.frame_length)
        pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)
        
        # Feed the audio chunk into the Porcupine engine for evaluation
        keyword_index = porcupine.course of(pcm)
    
        if keyword_index >= 0:
            # Wake phrase was detected, proceed to deal with the command...
            print("Wake phrase detected!")

    This loop is the guts of the assistant’s “passive listening” state. It constantly reads small, uncooked audio frames from the microphone stream. Every body is then handed to the porcupine.course of() perform. It is a extremely environment friendly, offline course of that analyzes the audio for the particular acoustic sample of your customized wake phrase (“Krishna”). If the sample is detected, porcupine.course of() returns a non-negative quantity, and the script proceeds to the lively part of listening for a full command.

    Speech-to-Textual content — Changing consumer inquiries to textual content

    After the wake phrase is detected, the assistant must hear for and perceive the consumer’s query. That is dealt with by the Speech-to-Textual content (STT) part.

    # --- This logic is inside the principle 'if keyword_index >= 0:' block ---
    
    print("Listening for command...")
    frames = []
    # Document audio from the stream for a set length (~10 seconds)
    for _ in vary(0, int(porcupine.sample_rate / porcupine.frame_length * 10)):
        frames.append(audio_stream.learn(porcupine.frame_length))
    
    # Convert the uncooked audio frames into an object the library can use
    audio_data = sr.AudioData(b"".be part of(frames), porcupine.sample_rate, 2)
    
    attempt:
        # Ship the audio information to Google's service for transcription
        command = recognizer.recognize_google(audio_data)
        print(f"You (command): {command}")
    besides sr.UnknownValueError:
        speak_text("Sorry, I did not catch that.")
    

    As soon as the wake phrase is detected, the code actively information audio from the microphone for about 10 seconds, capturing the consumer’s spoken command. It then packages this uncooked audio information and sends it to Google’s speech recognition service utilizing the speech_recognition library. The service processes the audio and returns the transcribed textual content, which is then saved within the command variable.

    Getting Solutions from Perplexity

    As soon as the consumer’s command has been transformed to textual content, it’s despatched to the Perplexity API to get an clever, up-to-date reply.

    # --- This logic runs if a command was efficiently transcribed ---
    
    if command:
        # Outline the directions and context for the AI
        messages = [{"role": "system", "content": "You are an AI assistant. You are located in Twinsburg, Ohio. All answers must be relevant to Cleveland, Ohio unless asked for differently by the user.  You MUST answer all questions in a single and VERY concise sentence."}]
        messages.append({"position": "consumer", "content material": command})
        
        # Ship the request to the Perplexity API
        response = perplexity_client.chat.completions.create(
            mannequin=MODEL_NAME, 
            messages=messages
        )
        assistant_response_text = response.decisions[0].message.content material.strip()
        speak_text(assistant_response_text)
    

    This code block is the “mind” of the operation. It first constructs a messages checklist, which features a crucial system immediate. This immediate offers the AI its persona and guidelines, comparable to answering in a single sentence and being conscious of its location in Ohio. The consumer’s command is then added to this checklist, and the complete package deal is shipped to the Perplexity API. The script then extracts the textual content from the AI’s response and passes it to the speak_text perform to be learn aloud.

    Textual content-to-Speech — Changing Perplexity response to Voice

    The speak_text perform is what offers the assistant its voice.

    def speak_text(text_to_speak, lang='en'):
        # Outline a perform that converts textual content to speech, default language is English
        
        print(f"Assistant (talking): {text_to_speak}")
        # Print the textual content for reference so the consumer can see what's being spoken
        
        attempt:
            pygame.mixer.init()
            # Initialize the Pygame mixer module for audio playback
            
            tts = gTTS(textual content=text_to_speak, lang=lang, gradual=False)
            # Create a Google Textual content-to-Speech (gTTS) object with the offered textual content and language
            # 'gradual=False' makes the speech sound extra pure (not slow-paced)
            
            mp3_filename = "response_audio.mp3"
            # Set the filename the place the generated speech can be saved
            
            tts.save(mp3_filename)
            # Save the generated speech as an MP3 file
            
            pygame.mixer.music.load(mp3_filename)
            # Load the MP3 file into Pygame's music participant for playback
            
            pygame.mixer.music.play()
            # Begin taking part in the speech audio
            
            whereas pygame.mixer.music.get_busy():
                pygame.time.Clock().tick(10)
            # Hold this system working (by checking if playback is ongoing)
            # This prevents the script from ending earlier than the speech finishes
            # The clock.tick(10) ensures it checks 10 occasions per second
            
            pygame.mixer.give up()
            # Give up the Pygame mixer as soon as playback is full to free assets
            
            os.take away(mp3_filename)
            # Delete the non permanent MP3 file after playback to wash up
            
        besides Exception as e:
            print(f"Error in Textual content-to-Speech: {e}")
            # Catch and show any errors that happen in the course of the speech technology or playback

    This perform takes a textual content string, prints it for reference, then makes use of the gTTS (Google Textual content-to-Speech) library to generate a short lived MP3 audio file. It performs the file via the system’s audio system utilizing the pygame library, waits till playback is completed, after which deletes the file. Error dealing with is included to catch points in the course of the course of.

    Testing the assistant

    Under is an illustration of the functioning of the customized voice assistant. To check its efficiency with Google Assistant, I’ve requested the identical query from Google in addition to from the customized assistant.

    As you may see, Google offers hyperlinks to the reply quite than offering a quick abstract of what the consumer needs. The customized assistant goes additional and offers a abstract and is extra useful and informational.

    Conclusion

    On this article, we seemed on the means of constructing a completely useful, hands-free voice assistant on a Raspberry Pi. By combining the facility of a customized wake phrase and the Perplexity API by utilizing Python, we created a easy voice assistant system that helps in getting info shortly.

    The important thing benefit of this LLM-based method is its potential to ship direct, synthesized solutions to advanced and present questions — a activity the place assistants like Google Assistant typically fall brief by merely offering a listing of search hyperlinks. As an alternative of appearing as a mere voice interface for a search engine, our assistant features as a real reply engine, parsing real-time internet outcomes to provide a single, concise response. The way forward for voice assistants lies on this deeper, extra clever integration, and constructing your individual is the easiest way to discover it.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026

    Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

    May 31, 2026

    Solving a Murder Mystery Using Bayesian Inference

    May 31, 2026

    Rerankers Aren’t Magic Either: When the Cross-Encoder Layer Is Worth the Cost

    May 31, 2026

    Qdrant TurboQuant Explained: Is TurboQuant the Silver Bullet?

    May 30, 2026

    Meta-Cognitive Regulation Might Be the Most Important AI Skill Nobody Is Talking About

    May 30, 2026

    Comments are closed.

    Editors Picks

    Universal Audio Volt 876 USB Audio Interface Review: Pro-Level Polish

    June 1, 2026

    New York City-based Mecka AI, which trains robots with human data sourced from body sensors and iPhones, raised $60M, including a $25M Series A (Ben Weiss/Fortune)

    June 1, 2026

    Is Instagram Down? What to Know

    June 1, 2026

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Today’s NYT Connections: Sports Edition Hints, Answers for Nov. 19 #422

    November 19, 2025

    Today’s NYT Mini Crossword Answers for Aug. 10

    August 10, 2025

    Bike makers invited to ditch the derailleur for combined motor/gearbox

    February 18, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.