New public release for Transcribable now available – Android OS 1.12.17

descriptionStandard

Transcribable 1.12.17 has been released to the public track this release brings several changes since 1.11.06A, summary below:

  • Introduces our offline capable speech recogniser (uses Whisper.cpp & Silero VAD).
  • Optionally download the models you require.
  • Animated background; animates with speech recogniser states.
  • Supports sharing audio files (mp3/m4a) for transcription using local Whisper service.
  • Introduces a voice input activity for Transcribable’s local Whisper recogniser.

Full changelog since 1.11.06A

Android OS app (369011217)

  • Offline capable speech recogniser:
    • Uses Whisper.cpp for speech recognition; provides support for downloading several different models (ggml-*). These options are currently limited to a selected few models but it will be expanded; if there is a particular Whisper model of interest let us know.
    • Silero VAD is used for voice activity detection.
    • Models for use with: Silero + Whisper are not bundled with the app; and require an additional download within the application.
      • This is primarily to avoid inflating the applications size with assets (models in this instance) that the user may not use. As Transcribable is intended to be used with many speech recogniser applications.
    • Our speech recogniser has a dedicated preference activity allowing:
      • Model downloads and changes.
      • Manage and remove downloaded models.
    • Features a continuous transcription mode; that processes results when speech activity pauses.
    • Local speech recogniser now includes a speech input activity; allows selection between in-app result handler or speech activity handler. Activity designed to be relatively small to minimise obstructing content; additionally the position can be customised.
    • Generally it is recommended to use the default “Speech Recogniser API“, this lets the app process results with it’s own custom listener specifically designed to not obstruct content. Where a recogniser doesn’t support “Speech Recogniser API“it falls back to the speech activity.
    • Now allows customisation of the recognition language.
      • Optional translate support is present but limited to models medium (non-en variants obviously) and above; additionally please note that translation is limited to translating from selected language to English.
        • Adds additionally processing time.
    • Improvements to the backup and restore system: Models can now be backed up to and restored from a separate location.
    • Model changes now prompt the user to reload the speech recogniser service (actionable).
    • Adds support for transcribing audio files (mp3/m4a) to text using the local Whisper service.
      • Note: that the Transcribable’s local Whisper service must be configured before use; if will prompt for configuration if unconfigured.
    • Local recogniser (Transcribables built-in offering) menu can be hidden under settings.
    • Supports viewing CPU information:
      • Benchmarks for memory and GGML models
  • Tasker plugin support (third-party support):
    • Send text to the current file or a separate file (created if it doesn’t already exist).
    • Retrieve text from the current active file.
  • Speech recogniser listening state adds additional animation to background of editor.
    • Can be disable through a preference.
    • Has extra animations stages for our new speech recogniser.
  • UI improvements:
    • Floating action button now cradles to the side of the toolbar.
    • Speech recogniser listening state adds additional animation to background of editor.
      • Can be disable through a preference.
      • Has extra animations stages for our new speech recogniser.
    • Various improvements for SDK 36 target involving insets.
  • Various other more subtle tweaks and improvements:
    • Clarified several local Whisper related options.
    • About page includes service version and build numbers.
    • Improvements to content/model backup restoration triggers and notification message prompts.
    • Fixed a rare crash involving SPen.
    • Revised several labels + help messages.

Learn more about Transcribable.

Caution the local speech recogniser performance will vary subject to CPU instruction set support and/or age

The local speech recogniser is powered by Whisper.cpp its performance will vary subject to how powerful your device is.

Where possible the Whisper.cpp is configured to try to use a variant optimised for instruction sets your CPU supports.

It is suggested to try tiny model first and if tiny is slow for you, then you will need a more powerful phone or use a different speech recogniser.

Development for this functionality was performed on the following devices; with both achieving acceptable transcription times:

  • Samsung S24 Ultra
  • Sony Xperia IV

Leave a Reply

Your email address will not be published. Required fields are marked *