Abstract
With a globally aging population, visual impairment is an increasingly pressing problem for our society. Visual disability drastically reduces quality of life and constitutes a large cost to the health care system. Mobility of the visually impaired is one of the most critical aspects affected by this disability, and yet, it relies on low-tech solutions, such as the white cane. Many avoid solutions entirely. In part, reluctance to use these solutions may be explained by their obtrusiveness, a strong deterrent for the adoption of many new devices. Here, we leverage new advancements in artificial intelligence, sensor systems, and soft electroactive materials toward an electronic travel aid with an obstacle detection and avoidance system for the visually impaired. The travel aid incorporates a stereoscopic camera platform, enabling computer vision, and a wearable haptic device that can stimulate discrete locations on the user’s abdomen to signal the presence of surrounding obstacles. The proposed technology could be integrated into commercial backpacks and support belts, thereby guaranteeing a discreet and unobtrusive solution.
1 Introduction
Visual impairment is a continually growing problem, affecting over 3.2 million Americans who are 40 years and older [1]. This number is expected to double by 2050 [1] due to the aging population [2]. Visual impairment is associated with reduced quality of life [3], along with increased risk of falls and hip fractures [4], depression [5], obesity [6], and death [7]. Such a debilitating condition brings about high costs for society at large. A study on the impact of vision loss in Canada claimed that direct and indirect costs associated with visual impairment summed up to $15.8 billion in 2007 (1.19% of the Country’s gross domestic product), constituting the most expensive disease category in terms of health care expenditures [8].
Affording increased, safe mobility and independence for the visually impaired is critical to improving quality of life and reducing financial burden. Despite the severity of the situation at a global scale, options for enhancing the mobility of the visually impaired remain limited, with little integration of new technologies. To date, the most common options are white canes and guide dogs. Less than 10% [9,10] of the visually impaired adopt these solutions, likely because of perceived social awkwardness, dependence, and anxiety [11–13]. These facts speak to the need for low-profile, inconspicuous assistive devices [14].
Most electronic travel aids (ETAs) proposed in the literature present limitations that hinder the widespread application of these systems. Among the proposed solutions, one can find: devices that transmit information through acoustic signals in earphones, thereby occluding air conduction, an important transmission pathway for auditory cues of safe navigation [15]; solutions with hand-held devices that limit users’ capacity to protect themselves in the case of an unexpected fall [16] or devices that implement nonintuitive coding and interfacing with the surrounding environment [17,18]; smart white canes that enhance existing conspicuous solutions and are challenging to wield dynamically [19,20]; and bulky, obtrusive, conspicuous systems that may cause uneasiness in the user [21]. Overall, little to no attention has been paid toward the acceptance of these technologies by the visually impaired. However, technological advancements have the potential to provide alternative, cost-effective solutions for visually impaired immobility. While regenerative medicine could ultimately restore vision [22], research in this area is still in its nascent stages and requires a bevy of therapeutic algorithms and alternatives for a comprehensive approach. Sensory substitution systems that fuse information and communication technology with artificial intelligence could enable localization, obstacle avoidance, and quality of life improvements in the short term [23].
Here, we put forward a proof-of-concept ETA for the visually impaired that uses computer vision for obstacle detection with a novel wearable device—a bookbag that seamlessly incorporates sensors and a modified waist strap fashioned into a belt that provides haptic feedback to the user based on the computer vision output. The belt comprises a two-row, five-column array of piezoelectric-based tactors, which stimulate the abdomen based on the location of identified obstacles. More specifically, the computer vision software partitions the scene into an array of ten rectangular capture fields, resembling the tactor configuration on the torso; if an obstacle is identified in one of these rectangles, the corresponding actuator in the array of tactors starts vibrating. The proposed solution allows for an inconspicuous, highly customizable design, which can be incorporated into commercial backpack systems.
Computer vision and generally deep learning have contributed significant advancements that can be integrated in assistive technologies for the visually impaired [24]. The use of a computer vision system supports the integration of critical features, such as object identification [25] or access to pedestrian signals at intersection crossings [26]. The output from other types of sensors, such as ultrasonic or light detection and ranging devices [27,28], could also be integrated into our system through sensor fusion toward more accurate obstacle identification [25]. Sensor fusion allows for a robust estimate of the distance of obstacles, in spite of variable lighting conditions and harsh weather that are detrimental for computer vision, and also assists in the identification of large obstacles, such as walls and columns [25]. With respect to the wearable device, haptic feedback is preferred over auditory stimuli, since it provides faster information transfer for hazard negotiation, even if at a coarser level of detail [29], and it does not disrupt organic auditory cues from the environment. The choice of torso-based stimulation for a wearable device enables concealment and an egocentric re-display scheme that is intuitive, a spatiotopically preserving mapping of hazards in the surrounding environment. A binaural bone conduction headset may also be wirelessly connected to the system through Bluetooth, enabling an additional communication channel.
The design proposed in this article builds upon our previous belt prototype [30], which demonstrated the feasibility of relaying information through tactile feedback on the abdomen utilizing piezoelectric-based actuators. Compared to the previous device, the new prototype includes a higher spatial resolution with a larger number of tactors, along with improved vibration amplitude. In addition, our previous work did not include any sensing element, which is pursued herein through computer vision techniques.
In what follows, we detail the design of the ETA. First, we describe the wearable haptic device and computer vision system. Then, we illustrate the interface between these components, and we perform a delay analysis. Finally, we highlight potential areas of future research and experimental tests.
2 Wearable Haptic Device
The wearable haptic device consists of a belt with ten discrete tactors, capable of providing haptic feedback to the user. Similar to Ref. [30], actuation of the tactors is based on macro-fiber composites (MFCs). Developed by NASA in 1999 [31], these materials are composed of piezoceramic wafers with interdigitated electrodes on a polyimide film, along with structural epoxy layers. We utilize P1-type MFC actuators from Smart Material (Sarasota, FL). When a voltage on the order of kilovolts is applied across the electrodes, these MFCs elongate through the so-called d33 effect [32]. In other words, these materials exploit the nonzero coupling between the components of stress tensor and electric field in the poling direction [32].
The macroscopic deformations of MFCs are ultimately determined by boundary conditions. In our previous work [30], we utilized the nonlinear coupling between expansion and bending in a postbuckling configuration [33,34]. While this design demonstrated the feasibility of haptic stimulation of the abdomen to convey information to the user, the amplitude of the out-of-plane vibrations of the actuators did not enable robust discrimination of the vibration [30]. To improve the performance of the actuator in terms of vibration amplitude and blocking force, we opt for an alternative solution by bonding the MFC to a 54 × 20 × 0.25 (length × width × thickness) mm aluminum plate using epoxy. In this configuration [35,36], the asymmetric elongation with respect to the neutral axis of the overall structure elicits out-of-plane bending.
To protect the actuator and prevent electric short circuits, we encapsulate the aluminum-backed MFC in a 3D-printed case made of polylactic acid (PLA), as shown in Fig. 1(a). Specifically, the actuator is fixed in a double cantilever configuration inside the case. Four 0.77 mm thick spacers are mounted between the actuator and the case to grant space for vibration. A hollow cylindrical protrusion is positioned anteriorly at the center of the actuator to transmit vibrations to the user’s skin. The bottom part of the cylinder is 3D printed in thermoplastic polyurethane (TPU) and is connected to the case through two slender, flexible TPU rods. The top part of the cylinder, made of 3D-printed PLA, consists of a fixed cavity and a removable cap fitted through a threaded mechanism. The diameter of the area in contact with the abdomen is 14 mm. The internal cavity of the cylinder can accommodate additional masses to optimize the resonance frequency of the actuators.
Preliminary mechanical tests show that adding a mass of 12 g reduces the resonance frequency to about 175 Hz [37], which is within the range of frequencies that skin mechanoreceptors are most sensitive to [29,38]. These tests [37] also suggest that the peak-to-peak displacement of the tactors can be increased by a tenfold factor throughout the frequency range of interest and more than a twentyfold factor at resonance with respect to our previous design [30], promising a significant improvement in vibration discrimination. The resulting amplitude would be at least twice the sensory threshold over the entire actuation frequency range [29]. Further details about the design and characterization of the electromechanical properties of the actuators can be found in Ref. [37].
Tactors are arranged in five columns and two rows on a commercial hiking belt (Granite Gear, Two Harbors, MN) that tethers to the backpack. They are mounted on a thin aluminum beam-based scaffold, which is later fastened to the belt, to maintain a minimum horizontal and vertical distance between the actuators of 130 and 110 mm, respectively. We expect this separation to improve users’ discrimination performance, as previously demonstrated [29]. A custom-made tissue sleeve is wrapped around the belt to cover electrical connections and improve users’ comfort, as shown in Fig. 1(b). Velcro® squares on the sleeve and tactors’ cases ensure accurate positioning and prevent damping of the vibration from the sleeve.
MFC actuators are driven by an electrical circuit composed of an arduino mega 2560 board (Arduino LLC, Boston, MA), five printed circuit boards (PCBs), and five high-voltage amplifiers (HVAs). Each PCB connects to an HVA, which can simultaneously drive two actuators. PCBs integrate a dual-channel operational amplifier (TLC2202CP, Texas Instruments, Dallas, TX) wired as an astable multivibrator. The switching frequency of the multivibrator is controlled by a dual-channel 10 kΩ digital potentiometer (AD8402ANZ10, Analog Devices Inc., Norwood, MA), whose variable resistance is selected by the arduino through a serial peripheral interface (SPI) protocol. The 0–5 V square wave generated by the multivibrator is fed to an HVA (AMT2012-CE3, Smart Material, Sarasota, FL). The HVA maps linearly the 0–2.5 V range to 0–0.5 kV, and the 2.5–5 V range to 0.5–2 kV. The ground pin of the HVA provides a 500 V offset, such that the output voltage will be in the –0.5 to 1.5 kV range.
Activation of PCBs and HVAs and modulation of the vibration frequency are achieved through a matlab® R2019a script that controls the arduino. PCB components are switched on by feeding a constant 5 V signal to their power pins. HVAs are turned on by shorting their enabling pin. The vibration frequency is modulated by varying the resistance of the digital potentiometer between 0 and 10 kΩ, corresponding to 10 Hz and 390 Hz, respectively, on 256 discrete levels. The desired level of resistance is selected by sending 10 bits through a SPI protocol. The arduino board is controlled by a laptop through a USB 2.0 cord, which supplies the power for the PCBs.
Power for the HVAs is provided by a TalentCell 3 12 V, 142.08 Wh lithium-ion battery (PB120B1, TalentCell, Shenzen, Guangdong Province, China), weighing 1 kg. This battery allows an operational period of three hours, assuming that all the actuators are continuously vibrating at resonance. Connection between the lithium-ion battery and HVAs is enabled by a break-out board through a standard 2.1 mm DC barrel-jack connector.
3 Computer Vision System
The computer vision system utilizes a ZED stereo camera (Stereolabs Inc., San Francisco, CA), which enables reconstruction of the depth field from the acquired images. The camera is encapsulated in a custom-made 3D-printed case that is secured to the shoulder straps of the bookbag and connected to a laptop through a USB 3.0 port.
We employ an Alienware 13 R3 (Dell, Round Rock, TX) with an Ubuntu 16.04.6 LTS operating system as the main processing unit. In this proof of concept, the choice of a laptop is motivated by the ease of programming and data accessibility. However, this solution would prove bulky in a commercial product, for which we will consider more compact options such as NVIDIA® Jetson™ modules.
Given the recent advancement in convolutional neural networks for object detection, in this article, we exploit such advances for real-time object detection. Our object detection method is adapted from the well-known real-time object detection approach You Only Look Once (YOLO) [39] with python 2.7 [40]. YOLO belongs to a family of computer vision algorithms that is currently able to identify over 9000 classes of objects [41]. YOLO was selected due to its small elaboration time per image, which is critical for our application. YOLO was developed based on a single neural network applied to the full image; then, the image is divided into regions with different sizes, and finally bounding boxes and probabilities are predicted for each region. The system outputs 3D locations from the point cloud associated with each identified object, along with one of the originally implemented semantic labels [39] and its detection confidence. From these data, we compute the centroid of the object as the average of the location of its points. Furthermore, we calculate the distance of the object as the Euclidean norm of the 3D vector at its centroid.
To spatiotopically map the obstacles to the tactors, we divide the acquired image in an array of ten equal rectangles, arranged in two rows and five columns, matching the spatial density of the belt. We assume that an object belongs to one of these rectangles when the projection of the centroid of the object on the image plane falls within the rectangle, see Fig. 2. This elaboration constitutes the activation signal for the tactors on the belt through the arduino, as described in Sec. 4. The algorithm can analyze images and provide these signals at a rate of 50 Hz, which is faster than the time in which typical saccadic eye movements would be able to acquire potential hazards [42].
4 Interface Between the Wearable Haptic Device and Computer Vision System
All the electronic components and the laptop are stored in a commercial backpack (V6012, Victoriatourist, WandF, Gathersburg, MD), as shown in Fig. 3. The camera is mounted in front of the backpack, to maintain alignment with the torso and avoid possible confounds due to head motion; the torso is never misaligned with the intended travel path, whereas the head and other potential body areas/segments often are. The tactors are connected to the HVAs in the backpack through channeled wires, which are covered by the sleeve and by a binding coil, as shown in Fig. 1(b).
The electronics of the haptic device are interfaced with the computer vision system through the main processing unit, as shown in Fig. 4. We utilize a transmission control protocol (TCP) to establish a connection between the computer vision algorithm in python 2.7 and the arduino control script in matlab® R2019b. When the system is turned on, as a first step, the matlab script initializes the arduino and opens the TCP, functioning as a server. Next, the python code establishes the connection as a client and begins to acquire images from the ZED camera, while the matlab script enters a spinning loop, waiting for data from the computer vision system. Once a frame is acquired, it is analyzed by the python code, which identifies all the objects present in the scene. For each object that is detected, the python code sends two numbers to the matlab script: the identifier number of the tactor to switch on, and the distance of the object. Finally, the matlab script processes the received information, sends signals to the arduino, and re-enters the spinning loop. To avoid saturating the connection and to limit power consumption, delays of 30 ms and 25 ms are added after each information transfer in python and after each check during the spinning loop in matlab, respectively.
matlab reads all the information in the TCP queue. For each detected object, it first verifies the depth of the object from the end user and then maps the distance to a frequency, such that closer objects will have a higher frequency, with the closest objects at the resonance frequency. To facilitate frequency discrimination, we consider only discrete frequency steps.
Specifically, we set the frequency to a level in the 25–175 Hz range, with intervals of approximately 25 Hz. First, a peri-personal distance range of 1–10 m is mapped linearly to the 25–175 Hz frequency range, with 10 m mapped to 25 Hz and 1 m mapped to 175 Hz. Then, we round the obtained frequency to the closest admissible integer frequency. The frequency is then mapped to a resistance value, which is sent to the digital potentiometer on the PCBs through an SPI protocol. If the corresponding tactor is already on, no further action is taken; otherwise, it is switched on for 200 ms. Such an intermittent pulse allows for easier vibration discrimination compared to continuous stimulation, which would elicit habituation.
To assess performance of the connection between the computer vision system and the haptic device, we perform two experiments, through which we seek to quantify the elaboration and transmission delay in the python and matlab codes, respectively. These two tests are performed independently to minimize the delay associated to saving the results. For the duration of the two experiments, the experimenter carries the computer vision system connected to the arduino in the backpack to analyze an environment; the participant walks in a repeated back-and-forth loop along a corridor 5 m long. During ambulation, the ZED camera scans the ambient space. The obstacles, encompassing a desk, multiple chairs, computers, monitors, laptops, and other smaller objects, are at fixed and known locations. Since our focus is on the interface between the computer vision system and the wearable haptic device, we do not consider performance of the individual subsystems (such as accuracy in object classification or vibration intensity).
More specifically, in the first experiment, we run the computer vision code in python for about 3 min while measuring the time elapsed from image acquisition to transmission of data to the matlab script, along with the number of identified objects. We exclude from our analysis all the acquired frames in which no object was detected by the computer vision system. For each frame that contains at least one identified object, we consider three contributions of the delay in the python script: the time to acquire the image and identify all point clouds in the frame (“PtCl”), to classify the detected objects and compute their distances (“Clas”), and to determine the position of the centroid of each object in the image and send all data through the TCP connection (“Send”). We consolidate all these contributions in the total delay in the python code (“Tot”).
In Fig. 5, we show boxplots indicating the delay for each of these operations. The median value of the total delay is 20.9 ms. Identifying the position of the objects and sending the data through the TCP have a negligible contribution to this delay, which is mostly determined by the operations in YOLO (generating the point cloud and labelling the objects, PtCl and Clas). We observe that the distributions of delays are skewed, with most of the data around the median value and a significant tail toward larger delays. Thus, the system normally behaves as expected, but in several instances, the delays increase significantly. Interestingly, the Pearson correlation coefficient indicates that only the time to send data through TCP correlates with the number of detected objects in the frame (r = 0.93), whereas all the other contributions and the total delay do not (r < 0.22). This suggests that the performance of the python script is independent of the number of objects detected in the frames.
In the second experiment, we systematically investigate the delay in the matlab script. Similar to the previous case, we run the computer vision code in python for about 5 min, which transmits data through TCP to the matlab script that elaborates and sends data downstream to arduino. In this case, we consider each reading of the TCP queue, which may contain multiple objects from several frames. We consider four contributions to the transmission delay in matlab: the time to receive the data from the TCP (“Read”), to elaborate on the numbers to find the corresponding pins and the resistance value for the potentiometer (“Elab”), to send data through the SPI protocol and modify the potentiometer resistance toward changing the vibration frequency (“Freq”), and to turn on electronic components of the PCBs, activate the corresponding actuators by shorting the enable pin of the HVAs, and provide the input voltages to the operational amplifiers (“On”). Furthermore, we calculate the total time (“Tot”) to obtain the data from the python code and send it to the arduino as the sum of the previous contributions. In addition, we record the number of objects in the TCP queue.
The results of this delay analysis are shown in Fig. 6. The median value of the total delay is 80.6 ms, with the largest contributions due to the operations of the arduino to modify the vibration frequency or activate the electronics. This value corresponds to a median time from acquisition to stimulation of 101.5 ms, an effective frequency of almost 10 Hz. Similar to the previous case, the dispersion of the data regarding the delay contributions in the matlab script is significant, with a non-Gaussian distribution skewed toward lower values. Large delays may be related to freezing of the SPI connection, which causes the arduino to reset. By computing the Pearson correlation coefficient, we find that delays are not correlated to the number of objects in the TCP queue (|r| < 0.04 for all contributions and total time). The significant delay introduced by arduino calls for the adoption of a more robust and reliable way to interface with the electronics. Given that the number of detected objects in the TCP queue does not continue increasing, we expect that the matlab script should be able to continue up with the transmission, emptying the queue accumulating in the TCP connection. This claim requires further integrated analyses during longer utilization periods, which will indicate the actual transmission delay between image acquisition and tactor activation.
5 Conclusions
In this study, we developed a new, integrated ETA for visual impairment. The ETA comprises a computer vision system for obstacle detection and a bookbag-based belt with discrete tactors, providing haptic feedback relative to obstacle location in an egocentric, spatiotopically preserved reference frame. Compared to solutions proposed in the literature, our system constitutes a customizable, inconspicuous option, whereby the belt could be detached and worn under personal garments and the remaining electronics and sensing components could be integrated in commercial backpacks. This ETA and approaches fashioned in a similar way could address the needs of the visually impaired, delivering unobtrusive aesthetics and an ergonomic design, critical aspects for the adoption of new assistive technologies [13,14].
Future studies and improvements are necessary to ensure usability of the device and potential commercialization. While preliminary mechanical tests were performed on the tactors, systematic, rigorous testing is required to quantify their performance against the previous generation of actuators [30]. In this vein, discrimination tests [29] will provide a first benchmark to assess the capabilities of our new wearable device, from which we expect a higher performance compared to our previous studies. Experiments on obstacle avoidance are the cornerstone in the evaluation of our integrated system. We will conduct hypothesis-driven experiments to demonstrate the effectiveness of the haptic feedback, for example, by counting the number of collisions with obstacles during a navigation task with and without the belt. We will perform these experiments with healthy subjects under simulated impairment conditions and with visually impaired subjects.
When testing the performance of the integrated system, we expect that our current control system will not relay information to the user with high fidelity in all circumstances. For example, it is untenable to assume that the vibration of multiple actuators in the presence of multiple obstacles will provide meaningful data to the end user, rather it is likely to create sensory overload; this must be addressed through adaptive vibratory and/or audio messages. These observations warrant further investigation toward the development of a more sophisticated control system. After optimizing the control system, we will test the performance of our device against analogous systems in the literature. Finally, several technical aspects, including integration of other sensors in the prototype, miniaturization of the main processing unit, a single-integrated software script for both computer vision and control of the belt’s microprocessor, and incorporation of the tactors into customized bookbag-based waist straps, shoulder straps, and back straps and padding, should be considered toward a wearable device with significant commercial appeal.
Acknowledgment
This research was supported by Connect-the-Dots Pilot Award (New York University Langone Health and Tandon School of Engineering) and by the National Science Foundation (Grant Nos. CMMI-1433670 and CNS-1952180).
Conflict of Interest
J.-R. Rizzo discloses conflict interests as a result of intellectual property owned by New York University and related advisory positions with equity and ad hoc compensation. In the future, the aforementioned project may relate to multicomponent wearable technologies relevant to the stated interests.