After a recent collaboration between the Jieshuo screen reader developer and VIVO to integrate the Blue Heart AI LLM into Jieshuo, driving the AI image description, icon detection, and inquire-by-voice functions, the daily limit on the number of recognitions has been lifted. This collaboration also introduced a new feature to Jieshuo: the video description function. Let’s explore how this feature works and briefly look at another AI-powered Jieshuo function: icon detection.
Table of Contents
Video Description
How It Works
Despite its name, the “video description” feature in Jieshuo doesn’t function as a typical video description service might. Unlike apps such as Seeing AI or PiccyBot, which allow you to share a video directly with the AI for recognition, this feature operates more as a continuous, detailed image description tool. When video description is enabled, Jieshuo repeatedly captures screenshots of the entire screen and sends each one to the AI service for analysis. The results are read aloud as soon as they’re received. In practice, if a video is playing, the AI will describe stills from the video.
The current implementation does not connect or link consecutive images, so each screenshot is analyzed independently. This is noticeable when results arrive, as later images aren’t treated as related to the content of previous images. While the recognition speed is fairly good, it may struggle to stay in sync with videos that have rapid scene changes, potentially leading to delayed or mismatched descriptions.
How to Enable/Disable Video Description
The video description function can be accessed from the Jieshuo’s main menu, assigned to a gesture, or found in the recognition menu. Note that the name is currently in Chinese, but translation should be added in the next Jieshuo beta update. Activating the function begins recognition, which should work on any screen, regardless of whether a video is playing. The feature continues recognizing even when screen content is static.
Since the recognition works on the entire screen rather than a specific focused area, other onscreen elements will be included in the description results if the video isn’t in full-screen mode.
To stop the continuous recognition, you can use the back function or gesture, or restart Jieshuo. Currently, tapping the function name doesn’t stop recognition, so it doesn’t function as a toggle.
Notes:
- Video description requires Jieshuo to have the “display over other apps” or “appear on top” permission.
- As far as I know, descriptions are initially in Chinese and are translated by Jieshuo’s translation feature into the target language you set in voice assistant and translation settings.
- When new results arrive, Jieshuo interrupts the reading of the previous result if it hasn’t finished reading it.
- Description quality depends on the capabilities of the Blue Heart model. Recognition-related comments weren’t included, as this post is intended to demonstrate how the feature works rather than assess the quality of the descriptions.
Icon Detection
The icon detection feature aims to identify the icons of onscreen elements, which can be helpful when interacting with unlabeled items. To use it, focus on the element you want to describe, then activate the “recognize focused icon” function. This can be found in the main menu, assigned to a gesture, or triggered through the voice assistant. Once recognition completes, the result will be read aloud. If you miss the automatic reading, you can retrieve the results from the recognition menu. Note that recognition results are cleared upon restarting Jieshuo.
Audio Demonstration

Comments