Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Unihertz Titan 2 revives BlackBerry style with Android 15
    • Berlin-based Climatiq raises €10 million to put climate impact at the centre of business decision making
    • 31 Best Early Amazon Prime Day Deals On Products We Tested (2025)
    • Today’s NYT Mini Crossword Answers for June 26
    • Use OpenAI Whisper for Automated Transcriptions
    • New obesity drug shows promise in weight loss trials
    • London-based AI platform Metaview raises €30.1 million dollars to help recruiters in “the war for talent”
    • Meta Wins Blockbuster AI Copyright Case—but There’s a Catch
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Thursday, June 26
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Use OpenAI Whisper for Automated Transcriptions
    Artificial Intelligence

    Use OpenAI Whisper for Automated Transcriptions

    Editor Times FeaturedBy Editor Times FeaturedJune 26, 2025No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    growth recently with massive language fashions (LLMs). Lots of the main target is on the question-answering you are able to do with each pure text-based fashions, or vision-language fashions (VLMs), the place you can too enter photographs.

    Nevertheless, there may be one other dimension that has advanced a ton over the previous couple of years: Audio. Fashions that may each transcribe (speech -> textual content), speech synthesis (textual content -> speech), and in addition speech-to-speech, the place you might have a complete dialog with a language mannequin, with audio going each out and in.

    The arcitecture and and coaching pipeline for OpenAI’s Whisper mannequin. Picture from OpenAI Whisper GitHub repository with MIT license.

    On this article, I’ll focus on how I’m using the event throughout the audio mannequin house to my benefit, turning into an much more environment friendly programmer.

    That is an instance video of me utilizing the transcription device. I first choose the immediate discipline in Cursor and use my hotkey to activate the microphone, which is indicated by the orange icon within the high left. I then communicate out the sentence I need to transcribe, and it shortly seems within the immediate window with out me having to kind on the keyboard in any respect. It is a extra environment friendly option to kind lengthy English prompts into your editor. Video by the writer.

    Motivation

    My main motivation for writing this text is that I’m regularly in search of methods to turn into a extra environment friendly programmer. After utilizing the ChatGPT cellular app for some time, I found their transcription possibility (the microphone icon to the appropriate within the person enter discipline). I used the transcription and shortly realized how significantly better this transcription is in comparison with others I’ve used earlier than, equivalent to Apple’s built-in iPhone transcription.

    OpenAI’s transcription virtually all the time captures all of my phrases, with only a few errors. Even when I take advantage of much less frequent phrases, for instance, acronyms associated to laptop science, it’s nonetheless in a position to choose up what I’m saying.

    The transcription icon from the OpenAI utility. Picture by the writer, taken from OpenAI’s ChatGPT.

    This transcription was solely obtainable within the ChatGPT app. Nevertheless, I do know that OpenAI has an API endpoint for his or her Whisper mannequin, which is (presumably) the identical mannequin they’re utilizing to transcribe textual content within the app. I thus wished to set this mannequin up on my Mac to be obtainable through a shortcut.

    (I do know there are apps equivalent to Macwhisper obtainable, however I wished to develop a very free resolution, apart from the prices of the API calls themselves)

    Stipulations

    • Alfred (I shall be utilizing Alfred on the Mac to set off some scripts. Nevertheless, alternate options to this additionally exist. Generally, you want a option to set off scripts in your Mac / PC from a hotkey.

    Execs

    The principle benefit of utilizing this transcription is you can enter phrases into your laptop extra shortly. Once I kind as shortly as I can on my laptop, I’m not even in a position to attain 100 phrases per minute, and if I’m to kind at that pace, I actually need to focus. Nevertheless, the common speaking pace is at a minimum of 110, according to this article.

    This implies you is usually a lot simpler if you’ll be able to communicate your phrases with transcription, as a substitute of typing them out on the keyboard.

    I believe that is particularly related after the rise of enormous language fashions equivalent to ChatGPT. You spend extra time prompting the language fashions, for instance, asking inquiries to ChatGPT, or prompting the cursor to implement a function, or fixing a bug. Thus, using the English language is rather more prevalent now than earlier than, in comparison with using programming languages equivalent to Python instantly.

    Notice: After all, you’ll nonetheless be writing plenty of code, however from expertise, I spend much more time prompting the cursor, for instance, with intensive English prompts, by which case, utilizing this transcription saves me plenty of time.

    Cons

    There can, nevertheless, be some downsides to utilizing the transcription as properly. One of many major ones is that plenty of instances, you do not need to talk out loud when programming. You is perhaps sitting within the airport (as I’m when writing this text), and even in your workplace. Once you’re in these eventualities, you most likely don’t need to disturb these round you by talking out loud. Nevertheless, in case you are sitting in a house workplace, that is naturally not an issue.

    One other unfavourable facet is that smaller prompts won’t be that a lot quicker. Think about this: for those who simply need to write a immediate of a single sentence, it can, in lots of eventualities, be quicker simply to kind the immediate out by hand. That is due to the delay in beginning, stopping, and transcribing audio into textual content. Sending the API name takes a little bit little bit of time, and the shorter the immediate you might have, the bigger fraction of the time you need to spend ready for the response.

    How you can implement

    You’ll be able to see the code I used in this article on my GitHub. Nevertheless, you additionally want so as to add hotkeys to run the scripts.

    First, you need to:

    • Clone the GitHub repository:
    git clone https://github.com/EivindKjosbakken/whisper-shortcut.git
    • Create a digital setting known as .venv and set up the required packages:
    python3 -m venv .venv
    supply .venv/bin/activate
    pip set up -r necessities.txt
    • Get an OpenAI API Key. You are able to do that by:
      • Going to the OpenAI API Overview, logging in/making a profile
      • Go to your profile, and API Keys
      • Create a brand new key. Bear in mind to repeat the important thing, as you will be unable to see it once more

    The scripts from the GitHub repository work by:

    • start_recording.sh — begins recording your voice. The primary time you employ this, it can ask you for permission to make use of the microphone
    • stop_recording.sh — sends a cease sign to the script to cease recording. Then sends the recorded audio to OpenAI for transcription. Moreover, it provides the transcribed textual content to your clipboard and pastes the textual content if in case you have a textual content discipline in your PC chosen

    The complete repository is accessible with an MIT license.

    Alfred

    You will discover the Alfred workflow on the GitHub repository right here: Transcribe.alfredworkflow.

    That is how I arrange the Alfred workflow:

    My Alfred workflow. I’ve two hotkeys, one to start out the transcription (report voice), and one to cease transcription (cease recording, and ship the audio to the OpenAI Whisper API for transcription). The choice + Q command runs the start_recording.sh script, and the choice + W run the stop_recording.sh script. You’ll be able to, in fact, change the hotkeys for these instructions. Picture by the writer.

    You’ll be able to merely obtain it and add it to your Alfred.

    Additionally, keep in mind to have a terminal window open everytime you need to run this script, as you activate the Python script from the terminal. I needed to do it this fashion as a result of if the script was activated instantly from Alfred, I obtained permission points. The primary time you run the script, you ought to be prompted to provide your terminal entry to the microphone, which it is best to approve.

    Price

    An essential consideration when utilizing APIs equivalent to OpenAI Whisper is the price of the API utilization. I might contemplate the price of utilizing OpenAI’s Whisper mannequin reasonably excessive. As all the time, the fee is totally depending on how a lot you employ the mannequin. I might say I take advantage of the mannequin as much as 25 instances a day, as much as 150 phrases, and the fee is lower than 1 greenback per day.

    This implies, nevertheless, that for those who use the mannequin lots, you’ll be able to see prices as much as 30 {dollars} monthly, which is unquestionably a considerable price. Nevertheless, I believe it’s essential to pay attention to the time financial savings you might have from the mannequin. If every mannequin utilization saves you 30 seconds, and you employ it 20 instances per day, you might have simply saved ten minutes of your day. Personally, I’m keen to pay one greenback to save lots of ten minutes of my day, performing a activity (writing on my keyboard), that doesn’t actually grant me every other profit. If any, utilizing your keyboard could contribute to a better danger of accidents equivalent to carpal tunnel syndrome. Utilizing the mannequin is thus positively price it for me.

    Conclusion

    On this article, I began off discussing the immense advances inside language fashions in the previous couple of years. This has helped us create highly effective chatbots, saving us monumental quantities of time. Nevertheless, with the advances of language fashions, we now have additionally seen advances in voice fashions. Transcription utilizing OpenAI Whisper is now close to excellent (from private expertise), which makes it a robust device you need to use to enter phrases in your laptop extra successfully. I mentioned the professionals and cons of utilizing OpenAI Whisper in your PC, and I additionally went step-by-step by way of how one can implement it by yourself laptop.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Economic Cycle Synchronization with Dynamic Time Warping

    June 26, 2025

    How to Train a Chatbot Using RAG and Custom Data

    June 25, 2025

    Core Machine Learning Skills, Revisited

    June 25, 2025

    Why Your Next LLM Might Not Have A Tokenizer

    June 25, 2025

    Agentic AI: Implementing Long-Term Memory

    June 25, 2025

    Data Has No Moat! | Towards Data Science

    June 24, 2025
    Leave A Reply Cancel Reply

    Editors Picks

    Unihertz Titan 2 revives BlackBerry style with Android 15

    June 26, 2025

    Berlin-based Climatiq raises €10 million to put climate impact at the centre of business decision making

    June 26, 2025

    31 Best Early Amazon Prime Day Deals On Products We Tested (2025)

    June 26, 2025

    Today’s NYT Mini Crossword Answers for June 26

    June 26, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Popular gym supplement’s effects on muscle gains might be overstated

    March 21, 2025

    Malik Habib : Pakistan’s stability was at risk if elections hadn’t taken place, potentially leading to a catastrophic downfall.

    October 11, 2024

    Ensure Hard Work Is Recognized With These 3 Steps

    May 25, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.