OBJECTIVE A major obstacle to improving bedside neurosurgical procedure safety and accuracy with image guidance technologies is the lack of a rapidly deployable, real-time registration and tracking... Show moreOBJECTIVE A major obstacle to improving bedside neurosurgical procedure safety and accuracy with image guidance technologies is the lack of a rapidly deployable, real-time registration and tracking system for a moving patient. This deficiency explains the persistence of freehand placement of external ventricular drains, which has an inherent risk of inaccurate positioning, multiple passes, tract hemorrhage, and injury to adjacent brain parenchyma. Here, the authors introduce and validate a novel image registration and real-time tracking system for frameless stereotactic neuronavigation and catheter placement in the nonimmobilized patient.METHODS Computer vision technology was used to develop an algorithm that performed near- continuous, automatic, and marker-less image registration. The program fuses a subject's preprocedure CT scans to live 3D camera images (Snap-Surface), and patient movement is incorporated by artificial intelligence- driven recalibration (Real-Track). The surface registration error (SRE) and target registration error (TRE) were calculated for 5 cadaveric heads that underwent serial movements (fast and slow velocity roll, pitch, and yaw motions) and several test conditions, such as surgical draping with limited anatomical exposure and differential subject lighting. Six catheters were placed in each cadaveric head (30 total placements) with a simulated sterile technique. Postprocedure CT scans allowed comparison of planned and actual catheter positions for user error calculation.RESULTS Registration was successful for all 5 cadaveric specimens, with an overall mean (+/- standard deviation) SRE of 0.429 +/- 0.108 mm for the catheter placements. Accuracy of TRE was maintained under 1.2 mm throughout specimen movements of low and high velocities of roll, pitch, and yaw, with the slowest recalibration time of 0.23 seconds. There were no statistically significant differences in SRE when the specimens were draped or fully undraped (p = 0.336). Performing registration in a bright versus a dimly lit environment had no statistically significant effect on SRE (p = 0.742 and 0.859, respectively). For the catheter placements, mean TRE was 0.862 +/- 0.322 mm and mean user error (difference between target and actual catheter tip) was 1.674 +/- 1.195 mm.CONCLUSIONS This computer vision-based registration system provided real-time tracking of cadaveric heads with a recalibration time of less than one-quarter of a second with submillimetric accuracy and enabled catheter placements with millimetric accuracy. Using this approach to guide bedside ventriculostomy could reduce complications, improve safety, and be extrapolated to other frameless stereotactic applications in awake, nonimmobilized patients. Show less
In numerous multimedia and multi-modal tasks from image and video retrieval to zero-shot recognition to multimedia question and answering, bridging image and text representations plays an... Show moreIn numerous multimedia and multi-modal tasks from image and video retrieval to zero-shot recognition to multimedia question and answering, bridging image and text representations plays an important and in some cases an indispensable role. To narrow the modality gap between vision and language, prior approaches attempt to discover their correlated semantics in a common feature space. However, these approaches omit the intra-modal semantic consistency when learning the inter-modal correlations. To address this problem, we propose cycle-consistent embeddings in a deep neural network for matching visual and textual representations. Our approach named as CycleMatch can maintain both inter-modal correlations and intra-modal consistency by cascading dual mappings and reconstructed mappings in a cyclic fashion. Moreover, in order to achieve a robust inference, we propose to employ two late-fusion approaches: average fusion and adaptive fusion. Both of them can effectively integrate the matching scores of different embedding features, without increasing the network complexity and training time. In the experiments on cross-modal retrieval, we demonstrate comprehensive results to verify the effectiveness of the proposed approach. Our approach achieves state-of-the-art performance on two well-known multi-modal datasets, Flickr30K and MSCOCO. Show less
The (slow) emergence of semi-automated or supervised detection techniques to identify anthropogenic objects in archaeological prospection using remote sensing data has received a mixed reception... Show moreThe (slow) emergence of semi-automated or supervised detection techniques to identify anthropogenic objects in archaeological prospection using remote sensing data has received a mixed reception during the past decade. Critics have stressed the superiority of human vision and the irreplaceability of human judgement in recognising archaeological traces, perceiving a threat that will undermine professional expertise and that archaeological experience and knowledge could be written out of the interpretative process (e.g. Hanson 2008, 2010; Palmer & Cowley 2010; Parcak 2009). Uneasiness amongst some archaeologists of losing control, even partially, of the interpretation process certainly seems to be a significant factor in criticisms, citing the undeniable fact that archaeological remains (or proxies for those remains) can assume a near-unlimited assortment of shapes, sizes and spectral properties. It is argued that only the human observer can deal with such complexity. Thus, while increasingly automated and supervised procedures for object detection and recognition and processing are flourishing in a variety of fields (e.g. medical imaging, facial recognition, cartography, navigation, surveillance; Szeliski 2011), their application to archaeological and, more generally, cultural landscapes is still in its infancy. However, as a number of published works (see References and General Reading List) and ongoing research demonstrate there are major benefits in developing this broad agenda. This paper provides a general review of the issues from a synergistic rather than competitive perspective, highlighting opportunities and discussing challenges. It also summarises a session on Computer vision vs human perception in remote sensing image analysis: time to move on held at the 44th Computer Applications and Quantitative Methods in Archaeology Conference (CAA 2016 Oslo 'Exploring Oceans of Data') that had a similar objective. Show less