Last updated on 25 January 2025
Jieshuo screen reader offers several functions to recognize and describe the current focused item or the entire screen. However, it is not possible to include custom questions or specific prompts with the captured image when using these functions. To address this, Jieshuo has introduced the Inquire by Voice functions, which allow users to capture a screenshot of the entire screen, the current focus, or use the camera to capture a scene, then send it to an AI service along with the text recognized from the user’s voice prompt. This enables users to ask specific questions about the captured images.
Please note that this post is not intended to review the functions’ results or compare them with other Jieshuo image recognition functions, as I haven’t thoroughly tested the new functions yet. Instead, it aims to demonstrate how these functions work and highlight their availability.
Table of Contents
How to Use the Inquire by Voice Functions
As the names imply, these functions are used to ask questions via voice. Currently, it is not possible to type the question or prompt.
The available functions are:
- Inquire by Voice about Current Focus: Captures a screenshot of the focused item and sends it to the AI service with the recognized text from the voice prompt you speak. If no item is focused, the screenshot will be of the entire screen.
- Inquire by Voice about the Entire Screen: Takes a screenshot of the entire screen, not only a specific item.
Both functions can be found in the main menu, assigned to gestures, activated by the Voice Assistant, and accessed from the Recognition menu. The Recognition menu is available in both the main menu and the functions menu, which can be accessed by swiping up and then right with one finger (default gesture).
When activating either function, you will hear a chime signaling that you can speak your message. Once you’re done, the text is sent with the captured image, followed by a “recognition in progress” message. The response will be read out automatically when ready. If you want to review any responses, go to the Recognition menu and select Recognition Results. Note that the responses are available, but the corresponding questions are not shown. Also, both questions and responses cannot be viewed in the Chat History for now.
Touch and Hold to Inquire About Current Focus
This method allows the user to tap and hold a focused element to use the inquire by voice function about the current focus. The item must be focused either by touching it or using the swiping method. After focusing on the item, the user needs to place their finger on the item’s position on the screen and hold it until hearing the prompt to start speaking. If the user does not wish to speak a prompt, they can use the back gesture to stop the listening service. However, this does not cancel recognition; instead, it provides a description of the item.
To activate this method, go to Advanced Settings, then Voice Assistant and Translation Settings, and check the option that is still in Chinese.
Asking About a Scene Using the Jieshuo Camera
To capture a scene and ask a question about it, open the Jieshuo Camera from the main menu and long-press the volume down key. Speak your prompt and wait for the camera shutter sound, which indicates that the recognition process has started. I noticed during testing that there is a slight delay between asking the question and taking the photo.
Typing Prompts Instead of Using Voice
If you prefer to type your inquiries instead of speaking them, navigate to Screen Reader Settings, select Advanced Settings, and then open Voice Assistant and Translation Settings. There, check the option: “Switch to Text for Voice Assistant and Inquire by Voice Functions.” After it is checked, a text field will appear when you activate any of the inquire functions where you can type your prompt. Tap OK to send the text along with the captured image.
Final remarks
Final remarks
The inquire-by-voice functions use the Vivo BlueHeart AI model, similar to other image recognition-related functions. While other functions rely on Jieshuo’s translation feature to translate results into the user’s target language, as specified in Voice Assistant and Translation settings, my tests show that the text of the spoken prompts is sent in English. Additionally, the responses are also received in English. For instance, I asked the system to identify a grammar mistake in an English text, and it provided a response. Note that I only tested this with English and did not use other languages.
Adding the ability to ask custom questions when recognizing a photo or scene is certainly a welcome addition. This feature could be further improved by offering follow-up questions.
Additionally, since these functions are linked to the AI service they use, any changes in the service will directly impact both the outcome and the functions’ usability. Although Chinese AI models might be considered less advanced than the more well-known international models, AI’s natural language processing and image description capabilities are continuously evolving. Lesser-known services are catching up and showing improvements.

Comments