Literature DB >> 36107827

Progressive privacy-preserving batch retrieval of lung CT image sequences based on edge-cloud collaborative computation.

Abstract

BACKGROUND: A computer tomography image (CI) sequence can be regarded as a time-series data that is composed of a great deal of nearby and similar CIs. Since the computational and I/O costs of similarity measure, encryption, and decryption calculation during a similarity retrieval of the large CI sequences (CIS) are extremely high, deploying all retrieval tasks in the cloud, however, will lead to excessive computing load on the cloud, which will greatly and negatively affect the retrieval performance. METHODOLOGIES: To tackle the above challenges, the paper proposes a progressive privacy-preserving Batch Retrieval scheme for the lung CISs based on edge-cloud collaborative computation called the BRS method. There are four supporting techniques to enable the BRS method, such as: 1) batch similarity measure for CISs, 2) CIB-based privacy preserving scheme, 3) uniform edge-cloud index framework, and 4) edge buffering.
RESULTS: The experimental results reveal that our method outperforms the state-of-the-art approaches in terms of efficiency and scalability, drastically reducing response time by lowering network communication costs while enhancing retrieval safety and accuracy.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36107827 PMCID： PMC9477338 DOI： 10.1371/journal.pone.0274507

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

With the rapid growth of the number of medical images and the increasing demand for remote diagnosis, content-based mobile retrieval for omputed tomography mage Sequences (CIS)s in telemedicine systems (TSs) [1] plays an increasingly important role in disease diagnosis in recent years. Fig 1 illustrates an example of a lung CIS consisting of a large number of nearby and visually similar omputed tomography mage(CI)s. As one of the main tasks of the TS, mobile-terminal-based high-resolution CIS retrieval enables medical professionals to identify lesion tissues in aberrant CIs and carry out the computer-assisted diagnosis and treatment.

Fig 1

An example of a lung CIS with 5 neighboring CIs.

The motivations of the CIS retrieval in edge-cloud collaborative computing mode are based on the following key observations: Instead of using the whole CIS, traditional CI retrieval takes a single CI as the retrieval one to perform similarity comparison which is ineffectual and inadequate in modeling the whole retrieval CIS leading to poor retrieval precision ratio; As the CIS data belongs to patients’ personal privacy information [2], it should be encrypted during the retrieval processing; otherwise, the personal information leakage will take place; To better understand the condition of their patients during remote consultations, doctors will frequently retrieve and examine their CISs in real time, which involves high computational costs, as well as the intensive transmission of the CISs. Therefore, deploying and executing all expensive retrieval and computing tasks in the cloud will result in significant computational overhead and have a negative impact on the retrieval’s performance improvement. To efficiently reduce the load of cloud computing, edge computing [3] came into being. As a new distributed computing mode, edge computing makes up for some shortcomings of cloud computing and diverts most computing tasks to edge device nodes (i.e., edge server (ES)) around the mobile terminal. This can not only significantly reduce the computing load on the cloud, but also reduce the transmission cost [4, 5] to support the retrieval in real time [6]; For these mobile terminals whose computing resources are constraint such as the battery reserves, screen resolutions and computational powers, etc. The data transmission is negatively affected by the unstable network bandwidth which causes delays in the data retrieval and transmission, especially in rural areas with inadequate mobile communication infrastructure [1]. Based on the above analysis, the paper presents a privacy-preserving atch etrieval method for large lung CIs in the edge-cloud computing network, called the BRS, by analyzing the similarity of the nearby CIs in the sequence. There are few studies on how to speedup the batch similarity retrieval of the large CISs using the edge-cloud collaborative computing environment. Specifically, when a user submits a retrieval CIS(X), firstly, an index mechanism at the edge layer called eIndex is used to quickly judge whether there are some answer CISs similar to X in the edge buffer. If exists, then a high-dimensional similarity retrieval of the partial answer ones supported by the cIndex is carried out by accessing the cloud; otherwise, the similarity retrieval of all CISs supported by the cIndex is performed directly through the cloud; finally the answer CISs are returned to the user node. The extensive experiments demonstrate the effectiveness, efficiency, and scalability of the BRS method.

Background

Over the past fifty years, content-based image retrieval(CBIR) has been a persistent and difficult research problem [6-10]. Due to the ‘semantic gap’, however, the retrieval accuracies are still not satisfactory. As one of the key subfields of the CBIR, content-based medical image retrieval (CBMIR) research has become increasingly popular in recent years. The first CBMIR system built for high-resolution lung CIs is ASSERT [11]. After that, many prototype systems were developed, including IRMA [12], FIRE [13], and others. A noisy image bag-based technique for retrieving medical images was presented by Huang et al. [14]. To further reduce the ‘semantic gap’, Huang et al. [15] developed a relevance feedback technique for the CBMIR based on a noisy-smoothing model. Kitanovski et al. [16] designed a multi-modality-based CBMIR system. Lan et al. [17] proposed a simple texture feature extraction algorithm for the CBMIR. A multi-panel medical image segmentation framework for the CBMIR system was supplied by Ali et al. [18]. Based on the fusion of the wavelet optimization and adaptive block truncation coding, Kasban et al. [19] built a reliable CBMIR system. Tuyet et al. [20] used the deep learning techniques to support the salient region-based CBMIR. Since the aforementioned CBMIR systems are based on single-PC mode, their retrieval performances are not satisfactory when dealing with a great deal of medical images [21]. Anbarasi et al. [22] developed a distributed CBMIR system using distributed database techniques. Charisi et al. [23] designed a parallel CBMIR scheme in a peer-to-peer(P2P) network. Based on the hybrid features, Depeursinge et al. [24] proposed a mobile access approach to peer-reviewed medical information. Although Zhuang et al. [25] put forward an efficient and robust CBMIR technique in a mobile wireless network, the retrieval efficiency is poor since the effectiveness of the load balance strategy needs to be further improved. Based on the previous work [25], Zhuang et al. [26] introduced a high-performance batch retrieval technique for medical images in wireless network from a standpoint of multi-retrieval optimization to further improve the retrieval efficiency. A mobile teleradiology system [27] is appropriate for streamlining the CBMIR procedure. For telemedicine applications, Chitra et al. [28] suggested an enhanced retrieval approach for brain images utilizing carrier frequency offset adjusted OFDM technique. To solve the ‘semantic gap’, Jiang et al. [29] introduced a novel framework of mobile similarity retrieval of medical images based on a crowdsourcing model. On the basis of the CI analysis, Lei et al. [30] developed a sparse CNN model-based high-resolution CI retrieval technique. Yu et al. [31] presented a liver CI retrieval algorithm based on a non-tensor product wavelet. Based on an adder combining two local bit plane-based dissimilarities, Hatibaruah et al. [32] introduced a novel CI retrieval approach. Hwang et al. [33] applied a CBIR and CNN techniques to enable diffuse interstitial lung disease retrieval. To facilitate the effective diagnosis of the lung cancer, Alzubi et al [34] designed a boosted neural network ensemble classification approach. Despite extensive study of the CI retrieval, the majority of approaches still rely solely on this retrieval without taking the CIS retrieval into account. Meanwhile, very little research has addressed CIS retrieval in the collaborative edge-cloud environment.

Preliminaries and preprocessing

Firstly, the main symbol notations are listed in Table 1.

Table 1

Main notations used throughout the paper.

Notation	Meaning
N _U	user node
N _E	edge node
N _C	cloud node
Ω	a set of n CISs in N_C
Ω′	the CISs buffered at N_E
X _i	the i-th CIS and X_i ∈ Ω
X _R	the retrieval CIS
r _R	the retrieval radius
Ψ	the final answer CISs
Ψ′	partial answer CISs based on the eIndex at N_E
Ψ″	partial answer CISs based on the cIndex at N_C
XR′i	the i-th historical retrieval CIS
rR′i	the i-th historical retrieval radius
CI _j	the j-th CI in X_i and CI_j ∈ X_i
POA _j	the j-th pathological object area in a CI and j ∈ [1, \|POA\|]
NPOA	the non-POA part of the CI
CIB _ij	the j-th correlated image block of the POAs in CI_i
NIB _ij	the j-th image block of the NPOA part in CI_i
sim(X_i, X_j)	similarity between two CISs (i.e., X_i and X_j) (ref. Eq (4))
d(CI_i, CI_j)	Euclidean distance between two CIs (i.e., CI_i and CI_j)
MaxN	maximal number of the CISs buffered at N_E
δ	granularity value for the size of image blocking

Fig 2 depicts the three-layer network architecture of the BRS system, which is formally stated in Definition 1.

Fig 2

The three layer architecture of the BRS system.

Definition 1(MECN). A mobile edge-cloud network(MECN) is represented by a graph () which can be modeled by a triplet: where • N means a set of nodes, formally represented as N = NU ∪ N ∪ N i) N; ii) N; iii) N; • E denotes a collection of edges representing the different network bandwidths for data communication at time T, formally denoted as: E = < e1, e2, …, e| >, where e = (N, N) refers to the k-th edge in in which N. Definition 2 (POA). A pathological object area (POA) in a CI can be modeled by a two-tuple: where i is the ID number of the POA, PO is the coordinate of the POA in the CI. According to Definition 2, a non-POA part of a CI is denoted as NPOA. Definition 3 (IB). An image block (IB) can be modeled as a triplet: where bid refers to the block ID, PO is the coordinate of the IB in the CI, and TP is the transmission priority of the block. Definition 4 (CIB). Given a POA(i.e., POA = {IB|IB ∩ POA ≠ ∅}, where k ∈ [1, |POA|] and |POA| means the number of the POAs in the CI. Definition 5 (NIB). A NIB is an IB that is contained by a NPOA in a CI, formally represented by: NIB = {IB|IB ∩ NPOA = IB}. As indicated in introduction section, there are usually some lesion tissues that the doctors may focus on in the CISs. The region of such lesion organ in the CIS is called the pathological object area (POA). In the preprocessing step, the POAs need to be preliminarily marked by medical specialists; then each CI in the sequence is equally divided into some IB (i.e., NIB and CIB) replicas, with the CIBs being encrypted and saved at their original pixel resolutions while the NIBs are stored at a lesser resolution. As illustrated in Fig 3, there are two POAs (A and B) and one NPOA (i.e., C) in an example CI which can be segmented into 6 × 8 IBs marked by red dash lines.

Fig 3

The 20 CIBs in a CI (δ = 6 × 8).

Methodologies

In this section, we first introduce four supporting techniques based on which a BRS algorithm is proposed next.

Supporting techniques

To better facilitate the batch retrieval of the lung CISs in the MECN, in this subsection, we introduce four supporting techniques: 1) batch similarity measure for CISs, 2) CIB-based privacy preserving scheme, 3) uniform edge-cloud index framework, and 4) edge buffering.

Batch similarity measure for CISs

As mentioned before, a CIS X is a time-series data which can be modeled by a vector: X = {CI1, CI2, …, CI|}, where |X| means the number of CIs in X. Due to the large amount of the CIs in a CIS, to effectively reduce the high computation cost in the CIS similarity matching, we propose a representative CI(RCI)-based batch similarity measurement of the CISs. Before introducing the batch similarity measure, how to extract the RCIs is a challenging issue. As summarized in Algorithm 1, given a CIS X, a RCI extraction processing of X is first performed to obtain ||X|| RCIs from a CIS, where ||X|| means the number of RCIs in X, d(x, y) is stated in Table 1, and ε is a small positive threshold. Algorithm 1 (X) input: X output: ||X||RCIs 1: j←1, ||X||←1; 2: while (i < |X|) do 3: if d(CI, CI) > ε then 4: add CI as the ||X||-th RCI; 5: ||X||++; 6: else 7: j++; 8: end if 9: end while 10: return ||X|| RCIs Once the ||X|| RCIs are extraction from X, X can be re-represented as: . So given two CISs (X and X), their batch similarity can be defined as follows: As can be seen from Eq (4), the similarity of two CISs can be measured by the percentage of similar RCIs in the two corresponding CISs.

CIB-based privacy-preserving scheme

Before introducing the CIB-based privacy-preserving scheme, let’s first give a definition. Definition 6 (POAR). Given a POA(i.e., POA: where POAR(POA, |•| denotes the number of CIBs in •. In Fig 3, there are two POAs (i.e., A and B) in the CI that is equally segmented into 6 × 8 IBs. Based on Definition 6, the corresponding POARs of the two POAs are represented by the green shadow areas which consist of 20 CIBs. Since the nearby CIB numbers have the characteristics of continuous distribution, it is easier to use these CIBs to reconstruct the original CI. As a result, the objective of the encryption strategy is to disrupt the continuity of the ID numbers of the nearby CIBs in the CI by encoding the ID numbers of the CIBs such that the CI reconstruction is hard to perform. So for each CIB in a CI, we first introduce a encoding scheme (IBID) of the ID numbers of the above CIBs, which is represented in Eq (6): where SID mean the ID of the CIS that the CIB belongs to, IID refers to the ID of the CI in which the CIB is contained, rID is row ID, cID is column ID, c1, c2, c3 are stretch constants and c1 >> c2 >> c3. Based on the ID numbers of the CIBs in Eq (6), their encryption and decryption strategies are described as follows: 1) : Algorithm 2 details the steps of the CIB-based encryption processing in which the ID numbers of the CIBs are encrypted, where δ and ω are two key values and δ < SID, δ < IID and ω < rID. Algorithm 2 input: SID, IID, rID, cID of a CIB output: IBID: the encrypted ID number of the CIB 1: if rID is an odd number and cID is an odd number then 2: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID 3: else if rID is an odd number and cID is an even number then 4: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID 5: else if rID is an even number and cID is an odd number then 6: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID 7: else 8: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID 9 end if 10 return the encrypted IBID 2) : Similarly, for the encrypted CIBs, their corresponding decryption processing is discussed in Algorithm 3. Algorithm 3 input: SID, IID, rID, cID of a CIB output: IBID: the encrypted ID number of the CIB 1: if rID is an odd number and cID is an odd number then 2: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID 3: else if rID is an odd number and cID is an even number then 4: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID 5: else if rID is an even number and cID is an odd number then 6: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID 7 else 8: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID 9: end if 10: return the encrypted IBID For instance, assume that SID is 7, IID is 4, c1, c2, c3 are 1000, 100 and 10, respectively, then the original ID numbers of the CIBs before encryption are depicted in Fig 4(a). Fig 4(b) shows the encrypted ID numbers of the CIBs after encryption when δ = 3 and ω = 0.6.

Fig 4

Comparison of the ID numbers of the CIBs before and after encryption.

Fig 4(a) shows the continuous distribution of the ID numbers of the nearby original CIBs before encryption. After encryption, as illustrated in Fig 4(b), the ID number distribution of the nearby CIBs is discrete. Therefore, the encryption of the CIBs makes it more and more difficult to find the corresponding nearby CIBs in the CI reconstruction. Next, we proceed to analyze the probability of the successful decryption (i.e., the probability of accurate image reconstruction). Given a CI with m POAs, for each POA(i.e., POA), the rows and columns of its corresponding POAR can be denoted as RS and CS, respectively. Then, the probability that the decryption processing is successful can be derived in Eq (7): Based on Eq (7), with increasing number of the CIBs in a CI, the probability of the successful decryption becomes smaller and smaller which guarantees the hardness of the decryption from a theoretical level. The encrypted CIBs are stored in N or N which ensures the corresponding CIBs’ IDs in a CI presents a discrete distribution rather than continuity to a certain extent. The reconstruction and display of the CIs are conducted at N by reversely decrypting the ID numbers of the CIBs based on the key values (i.e., δ and ω).

Uniform edge-cloud indexing framework

To support faster CIS filtering processing, we propose a uniform edge-cloud index framework (UECIF) based on iDistance [35], in which the UECIF is composed of two types of indexes: the eIndex in N and the cIndex in N. • For the eIndex, initially, suppose that the CISs in Ω are virtually stored in N, which means the CISs in Ω are physically stored in N, they are logically, however, not buffered in N. Then, the CISs are first grouped into the K clusters by the AP-cluster [36] based on visual similarity (i.e., Eq (4)). Given a CIS X, its index key can be represented below: where is the cluster centre of the j-th cluster that X belongs to, sim(⋅, ⋅) represents the visual similarity distance function (i.e., Eq (4)), j ∈ [1, K], and the constant c1 is used to stretch the value range. The index key is inserted into an improved B+-Tree in which a leaf node (LNode) can be modeled by a triplet: LNode(X) = < key, value, EType >, where EType = ‘F’ means X is not buffered in N; otherwise, X has been buffered in N. Algorithm 4 summarizes the initial construction process of the eIndex in which LNode(X).EType ← ‘F’(line 6) means all of the CISs are virtually stored in N. Algorithm 4 (Ω) input: Ω: the CIS set output: eIdx 1: eIdx←NULL; ▹ initialize 2: the CISs in Ω are grouped into K clusters; ▹ at edge node 3: for each CIS(X) in Ω do 4: ; 5: insert key(X) into an improved B+-tree(i.e., eIdx); 6: LNode(X).EType ← ‘F’; 7: end for 8 return the eIdx; Similar to the above, for the cIndex, first of all, the clustering processing of the CISs(Ω) in N is performed to obtain T clusters based on the above visual similarity. For a CIS X, its index key can be derived as: where j ∈ [1, T], and other parameters and symbols are the same to that of in Eq (8). In Algorithm 5, the index key is inserted into an improved B+-Tree in which a leaf node(LNode) can be represented by a triplet: LNode(X) = < key, value, CType >, where LNode(X).CType ← ‘T’ means X is stored in N. Algorithm 5 (Ω) input: Ω: the CIS set output: cIdx 1: cIdx←NULL; ▹ initialize 2: the CISs in Ω are grouped into T clusters; ▹ at cloud node 3: for each CIS(X) in Ω do 4: ; 5: insert KEY(X) into an improved B+-tree(i.e., cIdx); 6: LNode(X).CType ← ‘T’; 7: end for 8: return the cIdx; • Based on Eqs (8) and (9), suppose that there are n CISs in Ω, the index keys are inserted by an improved B+-Tree respectively. So for a range retrieval Θ(X, r) and each cluster C, as illustrated in Fig 5, there are five cases in terms of the positions of the two spheres.

Fig 5

Five cases in terms of the positions of the two spheres.

Case 1: in Fig 5(a), the inequalities and are met, which means Θ(X, r) intersects with by which X is contained. So the search range is represented as: ; Case 2: in Fig 5(b), the inequalities and are met, which means Θ(X, r) intersects with and does not contain X. So the search range is represented as: ; Case 3: in Fig 5(c), the inequality is met, which means Θ(X, r) contains . So the search range is represented as: ; Case 4: in Fig 5(d), the inequality is met, which means contains Θ(X, r). So the search range is represented as: ; Case 5: in Fig 5(e), the inequality is met, which means Θ(X, r) does not intersect with . No candidate sequences are retrieved. For the similarity retrieval in the MECN, there are two cases in terms of whether there exists a partial answer in N(Ψ) are directly obtained from N (Ψ) are composed of the partial answer ones (Ψ′) obtained from N(Ψ″) from N. Algorithm 6 details the similarity range retrieval of the CISs based on the eIndex in N. The routing is the implementation of the range similarity search in the improved B+-Tree which is described in Algorithm 9. Algorithm 6 (X, r, Ω′) input: Θ(X, r): the retrieval CIS, Ω′: the CIS whose ETypes are ‘T’ in N output: Ψ′: the partial answer CISs from N 1: Ψ′ ← Φ; ▹ initialization 2: Ψ′ ← (X, r, Ω′) 3: for each candidate CIS(X) ∈ Ψ′ do 4: if sim(X, X) > r then 5: Ψ′ ← Ψ′ − X; 6: else 7: if LNode(X).EType = ‘F’ then 8: LNode(X).EType ← ‘T’; ▹ for eIndex 9: LNode(X).CType ← ‘F’; ▹ for cIndex 10: update the information of X (e.g., access frequencies and access time) in the log file; 11: end if 12: end if 13: end for 14: return Ψ′ Similarly, Algorithm 7 summarizes the index support partial similarity range retrieval of the CISs at the cloud node level. It is worth mentioning that LNode(X).CType = ‘T’ means X is not buffered in N. Algorithm 7 (X, r, Ω′) input: Θ(X, r): the retrieval CIS, Ω′: the CIS whose ETypes are ‘T’ in N output: Ψ′: the partial answer CISs from cloud node 1: Ψ′ ← Φ; ▹ initialization 2: Ψ′ ← (X, r, Ω′) 3: for each candidate CIS(X) ∈ Ψ′ do 4: if sim(X, X) > r then 5: Ψ′ ← Ψ′ − X; 6: else 7: if LNode(X).EType = ‘F’ then 8: LNode(X).EType ← ‘T’; ▹ for eIndex 9: LNode(X).CType ← ‘F’; ▹ for cIndex 10: end if 11: end if 12: end for 13: return Ψ′ Finally, obtaining the complete answer CISs from the cloud node is detailed in Algorithm 8. Algorithm 8 (X, r, Ω) input: Θ(X, r): the retrieval CIS, Ω: the CIS in N output: Ψ: the complete answer CISs from N 1: Ψ ← Φ ▹ initialization 2: Ψ ← (X, r, Ω) 3 for each (X) ∈ Ψ do 4: if sim(X, X) > r then 5: Ψ ← Ψ − X; 6: end if 7: end for 8: return Ψ Algorithm 9 (X, r, Ω) input: Θ(X, r): the retrieval CIS, Φ: the CISs output: Φ′: the candidate CISs 1: Φ′ ← NULL; 2: for the CISs in each cluster C do 3: if and then 4: ; 5: else if and then 6: ; 7: else if them 8: ; 9: else if then 10: ; 11: else 12: exit(); 13: end if 14: Φ′ ← Φ′ ∪ [left, right]; 15: end for 16: return Φ′ • When user submits a retrieval request, the eIndex needs to be updated by adding the CISs that have been accessed in this retrieval. Since the number of CISs buffered in N is limited, how to optimally choose the buffered CISs is challenging. For example, assume that there are six CISs in N, Tables 2 and 3 illustrate the ranking of access time (AT) and access frequencies (AF) for the six CISs, respectively. In Table 2, the ATs of the six CISs are sorted in an ascending order, which are quantitatively represented by the AT_IDs. Then a weighted AT (WAT) can be derived as follows:

Table 2

Ranking of the AT.

	CIS ₁	CIS ₂	CIS ₃	CIS ₄	CIS ₅	CIS ₆
AT	2:50	3:40	1:48	4:21	2:32	3:11
AT_ID	4	2	6	1	5	3
WAT	4/6	2/6	6/6	1/6	5/6	3/6

Table 3

Ranking of the AF.

	CIS ₁	CIS ₂	CIS ₃	CIS ₄	CIS ₅	CIS ₆
AF	1	3	1	4	2	3
WAF	4/5	2/5	4/5	1/5	3/5	2/5

Similarly, for the ranking of the access frequencies(AF), a weighted AF(WAF) is represented in Eq (11): Based on Eqs (10) and (11), given a CIS, its uniform ranking score (URS) is shown below: Table 4 illustrates the uniform ranking scores of the six CISs. Based on Eq (12), the smaller the URS, the more important the CIS. If MaxN is 4, then CIS3 and CIS1 can be removed from the edge buffer.

Table 4

Uniform ranking score.

	CIS ₁	CIS ₂	CIS ₃	CIS ₄	CIS ₅	CIS ₆
WRA	4/6	2/6	6/6	1/6	5/6	3/6
WAF	4/5	2/5	4/5	1/5	3/5	2/5
URS	44/30	22/30	54/30	11/30	43/30	27/30

Algorithm 10 (Ω′) input: Ω′: the CISs buffered at N output: the updated Ω′ 1: for i = 1 to MaxN − |Ω′| − 1 do 2: remove a CIS whose URS is the largest from Ω′; 3: Ω′ ← Ω′ − CIS; 4: LNode(CIS).CType ← ‘T’; 5: LNode(CIS).EType ← ‘F’; 6: end for 7: return the updated Ω′

Edge buffering

Unlike the traditional image retrieval methods, which directly obtain data from the remote cloud, if the answer CISs can be directly obtained from the edge without accessing the cloud, it will greatly shorten the long-distance transmission delay and improve the retrieval efficiency. Based on the above motivation, we propose an edge buffering scheme by analyzing the user historical retrieval (HR) log file. The refinement cost of the candidate CISs can be significantly decreased with the help of the buffering scheme since a portion of answer CISs can be retrieved directly without any refinement processing. Specifically, assume that n HRs have been successfully completed with accurate results. Due to the fact that the answer CISs provided by each HR have been verified, when a user submits a new retrieval CIS (i.e., X), it is highly possible that X may be similar or even the same as the HR one (i.e., ). As a result, the retrieval efficiency and accuracy can be greatly improved if the HR results in N can be carefully reused as a part of the current results. Definition 7(CRS). Given a retrieval CIS X CIS retrieval sphere (CRS) is a high-dimensional sphere with a centre X Θ(X, r). Definition 8(HCRS). Given a HR CIS and a retrieval radius , their corresponding historical CIS retrieval sphere (HCRS) is a high-dimensional sphere with a centre and a radius , denoted as . Definition 9(AA). Given a CRS Θ(X, r) and a HCRS , their corresponding affected area (AA) is the intersection part of the two spheres, formally denoted as: . For example, as shown in Fig 6, there are three HCRSs, i.e., , and . The current CRS is represented as: Θ(X, r). For X, it’s corresponding 1, 2 and 3 nearest neighbor CISs are , and , respectively. Therefore, the HR of can be safely discarded since its corresponding HCRS does not intersect with Θ(X, r). The CISs falling in the AA (i.e., and can be a part of the answer CISs of Θ(X, r).

Fig 6

An example of the edge buffering scheme.

Next, given two CIS retrieval spheres: Θ(X, r) and , there exists two cases on the basis of the two retrieval CISs (i.e., X and ), which are shown in Figs 7 and 8, respectively.

Fig 7

Fig 8

, (a). , (b). , (c). , (d). .

, (a). , (b). , (c). , (d). . In Fig 7, if , then there exists two cases in terms of the retrieval radii (i.e., r and ). • For case (a) which is formally represented as: , since the CISs falling in the HCRS have already undergone verification, they can be part of the answer CISs for Θ(X, r); • For case (b) which is formally represented as: , the answer CISs for Θ(X, r) can be derived from the CISs in . In Fig 8, if , there are four cases according to the placement of the two spheres (i.e., Θ(X, r) and ). • In case (a), as the AA of the above two spheres does not exist, formally represented as: , the answer CISs need to be calculated sequentially in Θ(X, r); • In case (b), as the AA of the above two spheres exists, formally represented as: . Since the CISs falling in have been verified previously, they can be regarded as a part of a candidate CIS set of Θ(X, r); • In case (c), as the AA of the above two spheres is , formally represented as: . As the CISs that fall in have been verified previously; they can be regarded as a part of an answer CIS set of Θ(X, r); • In case (d), as the AA of the above two spheres is Θ(X, r), formally represented as: , the answer CISs are contained by the CISs falling in .

The BRS algorithm

Before introducing the algorithm, a pre-processing step is first conducted. Algorithm 11 summarizes the detailed steps of our proposed BRS method in which (X,r), (X,r) and (X,r) correspond to Algorithms 6-8, respectively. As illustrated in Fig 9, first of all, when a retrieval lung CIS (X) is submitted to the edge node level N from the user one N, then the eIndex scheme in N is adopted to quickly judge whether there are some answer CISs similar to X. If exists, then the high-dimensional similarity retrieval is carried out with the support of the cIndex scheme at the cloud to obtain some partial retrieval answer CISs; otherwise, the similarity retrieval of all CISs supported by the cIndex is performed directly through the cloud, and finally the answer CISs are returned to the receiver node. Note that, in line 9, before transmitting the answer CISs to the receiver, the decryption processing of the CIBs in the CIs need to be performed to ensure the accurate reconstruction and display of the answer CISs. Compared to NIBs, the CIBs have higher transmission priorities. In accordance with the various priorities of the IBs, they can be transmitted in descending order of priority, which not only assures the security of data transmission but also ensures that the critical information can be transmitted first.

Fig 9

An example of the BRS processing.

Algorithm 11 (X, r) input: X: a retrieval CIS, r: a retrieval radius output: Ψ: the answer CISs 1: a retrieval CIS (X) is submitted from N; 2: Ψ′ ← (X, r); ▹ obtain answer CISs(Ψ′) based on the eIndex at the edge node U; 3: Ψ′ ≠ NULL then 4: Ψ″ ← (X, r); ▹ obtain the partial answer CISs(Ψ″) based on cIndex at the cloud U; 5: Ψ ← Ψ′ ∪ Ψ″; 6: else 7: Ψ ← (X, r); ▹ obtain the complete answer CISs based on cIndex at the cloud U; 8: end if 9: transmit the CISs in Ψ to the receiver node level with different transmission priorities

Experiments

To verify the efficiency of the proposed BRS method, extensive simulation experiments are conducted to demonstrate the retrieval performance.

Experimental setup

In the experiments, the image receiver client is equipped with a 5.9-inch, full HD 1080p screen and a Qualcomm® Snapdragon™ 650 processor running at 1.7GHz. The client system is developed in Java and operates on the Android operating system [37]. The edge node and the cloud one are connected via 1Gbps network links. In the cloud node, the IBs (i.e., CIB and NIB) with various transmission priorities are kept in a file system and some structured data is recorded by the MySQL [38]. Each node contains a 2.7 GHz quad-core Xeon processor, 2.0 Gigabyte memory, and 1.0 Terabyte hard disk. The maximum data communication rate is 150 Mbps in the wireless network. We selected the LUNA16 dataset [39], which contains 239,232 lung CIs, as our experimental dataset. There are 888 lung CISs in this database, with an average of 336 lung CISs each set. The lung CISs in each set range in level from 200 to 600.

A prototype retrieval system

Fig 10 depicts a demonstration of the prototype system. An example of the CIS pre-processing backend interface is shown in Fig 10(a) in which a POA as been marked by a blue rectangle line. In Fig 10(b), a CIS with the category ‘lung’ has been inputted as a retrieval sequence. Four result CISs were quickly retrieved, and their matching IBs are restored and shown.

Fig 10

A prototype system of the BRS.

Effectiveness of the BRS method

The first experiment testifies the effectiveness of our BRS method by using the lung CISs randomly selected as experimental data. The recall and precision achieved by this retrieval method can be defined as: where rel means the set of ground-truth, and ret refers to the set of results returned by a similarity range search. As shown in Fig 11, performance comparisons of the retrieval effectiveness of the 10 CISs with the same organ (i.e., lung) that are randomly selected from the database are conducted. As can be observed from the figure, precision steadily declines as recall ratio rises. The reason is that when the recall rate is low, it’s highly possible that the correctness rate of the result CISs is high. Meanwhile, the high recall rate can not guarantee that the retrieval results contain the correct CISs.

Fig 11

Retrieval accuracy.

Effect of data size

In this experiment, we investigate the effect of data size (i.e., the number of the lung CISs) on the retrieval efficiency by using the two methods: 1) our proposed BRS method; 2) The MIRC method in [25]. In this experiment, the network bandwidth is 100Mbps and the number of edge nodes is 15, and the UECI framework is used. In Fig 12, with the increase of the CISs, the BRS method is superior to the MIRC since the edge buffer is verified to significantly reduce the retrieval computation cost and transmission delay. Meanwhile, it is interesting to observe that as the data size increases, the overall response time first grows rapidly and then gradually. This is because the index performs better when there is more data.

Fig 12

Effect of data size.

Effect of ε

The experiment evaluates the effect of ε on the retrieval performance. Similar to the above experiment, the network bandwidth and the number of cloud(edge) nodes are fixed, and the edge buffering scheme and the indexing mechanism are adopted. As demonstrated in Fig 13(a), with the increase of ε, the CPU cost for the similarity computation is decreasing due to the decrease of the number of the RCIs in each CIS. Meanwhile, it’s interesting to note in Fig 13(b) that the precision ratio increases rapidly first and then decreases gradually. The reason is that too many or too few RCIs will make it difficult to accurately and completely measure the similarity of the CISs. Therefore, an optimal ε is set to be 0.65.

Fig 13

Effect of ε.

Effect of edge buffering scheme

In this experiment, we proceed to study the effect of the edge buffering scheme on the retrieval performance. Method 1 adopts the edge buffering scheme and method 2 do not use it. Fig 14 demonstrates that the overall response time using method 1 is faster than method 2 when the bandwidth is stable and the retrieval radius (r) is fixed. Meanwhile, the performance gap widens as r steadily grows while the band-width remains constant. This is because with the increase of r, the probability of obtaining the result CISs in the edge buffering is also increasing.

Fig 14

Effect of edge buffering scheme.

Effect of indexing scheme

The final experiment examines how the index framework (i.e., eIndex and cIndex) affects retrieval efficiency. Here, method 1 uses the aforementioned two indexes, whereas method 2 does not (i.e., it sequentially searches each cloud node to find the answer CISs). In Fig 15, when the data size and the network bandwidth are fixed, the number of the cloud nodes varies from 10 to 50, the response time for the method 1 (i.e., index-based retrieval) is growing with the number of the cloud nodes increases. Meanwhile, the performance gap of the two approaches becomes smaller since the response time for method 2 is relatively stable and locating the corresponding candidate CISs based on the index is faster than that of no index. It’s interesting to notice that the retrieval response time is the smallest when the number of cloud nodes is 10. The larger the number of cloud nodes involved in retrieval, a large amount of data exchange and transmission will occur, resulting in retrieval delay.

Fig 15

Effect of indexing scheme.

Conclusion

In this paper, we introduced the BRS method—a privacy-preserving batch retrieval of the lung CISs in edge-cloud collaborative computing environment. The goal of our proposed BRS is to provide a safe and efficient retrieval of the lung CISs in resource- constraint network with low and unstable network bandwidth. To enable the efficient BRS processing, four supporting techniques are proposed, namely, 1) batch similarity measure for CISs, 2) CIB-based privacy preserving scheme, 3) uniform edge-cloud index framework, and 4) edge buffering scheme. The experimental results reveal that the efficiency of the BRS method is more than 200% higher than that of the sequential retrieval with the aid of the supporting techniques, especially when the number of cloud nodes is smaller.

6 in total

1. Mobile medical visual information retrieval.

Authors: Adrien Depeursinge; Samuel Duc; Ivan Eggel; Henning Müller
Journal: IEEE Trans Inf Technol Biomed Date: 2011-12-06

2. Generic integration of content-based image retrieval in computer-aided diagnosis.

Authors: Petra Welter; Benedikt Fischer; Rolf W Günther; Thomas M Deserno né Lehmann
Journal: Comput Methods Programs Biomed Date: 2011-10-05 Impact factor: 5.428

3. Clustering by passing messages between data points.

Authors: Brendan J Frey; Delbert Dueck
Journal: Science Date: 2007-01-11 Impact factor: 47.728

4. Mobile teleradiology system suitable for m-health services supporting content and semantic based image retrieval on a grid infrastructure.

Authors: Alexandra La Cruz; Ruben Medina; Francisco Vega; Wilson Perez; Blanca Ochoa; Victor Saquicela; Mauricio Espinoza; Lizandro Solano-Quinde; Maria-Esther Vidal
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2016-08

5. High-resolution CT Image Retrieval Using Sparse Convolutional Neural Network.

Authors: Yang Lei; Dong Xu; Zhengyang Zhou; Kristin Higgins; Xue Dong; Tian Liu; Hyunsuk Shim; Hui Mao; Walter J Curran; Xiaofeng Yang
Journal: Proc SPIE Int Soc Opt Eng Date: 2018-03-09

6. Content-Based Image Retrieval of Chest CT with Convolutional Neural Network for Diffuse Interstitial Lung Disease: Performance Assessment in Three Major Idiopathic Interstitial Pneumonias.

Authors: Hye Jeon Hwang; Joon Beom Seo; Sang Min Lee; Eun Young Kim; Beomhee Park; Hyun Jin Bae; Namkug Kim
Journal: Korean J Radiol Date: 2020-10-21 Impact factor: 3.500

6 in total