Cite

Figure 1:

Flowchart of the proposed system.
Flowchart of the proposed system.

Figure 2:

Workflow diagram of the proposed system.
Workflow diagram of the proposed system.

Figure 3:

Semantically closer words of কিডনি (kidney).
Semantically closer words of কিডনি (kidney).

Figure 4:

Semantically closer words of পাথর (stone).
Semantically closer words of পাথর (stone).

Figure 5:

Semantically closer words of চিকিৎসা (treatment).
Semantically closer words of চিকিৎসা (treatment).

Figure 6:

Semantically closer words of হাসপাতাল (hospital).
Semantically closer words of হাসপাতাল (hospital).

Figure 7:

Response of user submitted query.
Response of user submitted query.

Figure 8:

The skip-gram model’s architecture.
The skip-gram model’s architecture.

Figure 9:

Execution time with respect to number of tokens.
Execution time with respect to number of tokens.

Figure 10:

Graphical representation of the confusion matrix.
Graphical representation of the confusion matrix.

Comparative study of the very same design system with the proposed work.

Author(s) Methodology applied in near identical system Methodology applied in proposed system
Arefin et al. [28]

(1) In the article [28], the pre-processing stage utilizes a lowercase conversion, removing escaped words, tokenization, and PoS tagging techniques.

(2) The word similarity has been used to select closely related words.

(3) The Jaro-Winkler matching algorithm and the Naive method Bayes are utilized to develop this system as in the article [28].

(1) The proposed system employs tokenization, stop-words removal, and stemming in the pre-processing stage.

(2) This system uses semantic similarity to select entities and attributes that help to frame the SQL.

(3) The skip-gram model is used to detect close related words whereas Noise Contrastive Estimation is used to remove irrelevant words. Stochastic Gradient Descent is employed to optimize the proposed model.

Liu et al. [29]

(1) The proposed model contains three layers that are Encoder layer, parse layer and output layer.

(2) The encoder is used to encode the input information where the parse layer is responsible for connecting vectors of various parts.

(3) The output part is handling the logit output of this model.

(1) The machine learning model part is divided into three layers namely input layer, hidden layer, output layer.

(2) In this case each word is assigned a unique value to encode the information.

(3) The cosine similarity captures semantically close words.

Sanyal et al. [30]

(1) This system converts queries from the English language to SQL.

(2) The work [30] is related to the high performance of the SQL generation.

(3) Tokenization, stop words removal, parsing, lexical analysis, synonym detection and Formation, and filtered word mechanism of NLP to implement the work [30].

(1) The proposed system converts the Bengali queries to SQL.

(2) The unsupervised machine learning model has been used to generate SQL from NL queries.

(3) NLP based data preprocessing task and the Skip-gram model have been applied to implement the proposed system.

Sugandhika and Ahangama. [31]

(1) The XML file contains the metadata of a particular database where the XML extractor works to read this XML file for metadata. This metadata has been used to generate SQL because that contains information about table names, column names, operator details, etc.

(2) The BASIC clause generator is used to extract elements that are useful to form the BASIC clause of the SQL.

(3) The column names, column value, relational operators, and concatenating operators are used to frame SQL.

(1) The unlabelled Bengali text has been used to train the skip-gram model to find out the related words.

(2) A set of predefined rules has been applied to form SQL with ‘SELECT’, ‘WHERE’, IN, AND, OR, etc. clauses.

(3) Semantically close words have been used to predict the entity and attribute of a particular table.

Pal et al. [32]

(1) The proposed model [32] is a deep learning-based model that handles natural language questions and generates valid SQL.

(2) According to the sensitive nature of some databases, the data privacy approach has been included in this proposed model [32].

(3) The model [32] used vectors that are RoBERTa embedding and data-agnostic knowledge vectors. Vectors are passed through some sub-models that are also LSTM-based models to predict the final SQL query.

(1) The proposed system uses the unsupervised skip-gram model to covert Bengali NL queries to SQL.

(2) Noise Contrastive Estimation (NCE) is used to discriminate between actual data and the noise of the data.

(3) A unique value is assigned to each and every unique word to form a vector. Vectors have been used as input in the skip-gram model to identify the close words.

Huo et al. [33]

(1) In SyntaxSQLNet, a Bi-directional LSTM is used to encode a natural language phrase as in [33].

(2) This approach incorporates both the global table information and the local column information that is used as input to the BiLSTM.

(3) SQL-specific tree-based decoder has been used in the proposed model to understand the SQL structure. The COL module is used to predict the column and this model has been improved here.

(1) The proposed system encodes Bengali phrases with the help of word embedding method.

(2) The created dictionary is employed to train the skip-gram model.

(3) Output of the skip-gram model is used to identify the entity and attributes in a predefined healthcare database.

Confusion matrix.

Expected output Vs. Select output Select output-Positive Select output-Negative
Expected output-Positive True positives (TP)-65 False positives (FP)-17
Expected output-Negative False negatives (FN)-25 True negatives (TN)-49

Structure of department table.

id_dept name_dept id_hos
70 নবজাতক (neonate) 7
170 মনোরোগ (psychiatry) 12

Structure of hospital table.

id_hos name_hos add_hos district_hos State_hos
1

মুর্শিদাবাদ জেলা হাসপাতাল (murʃid̪abad̪ ɟela haʃpat̪al)

(Murshidabad District Hospital)

লালগোলা (lalgola) (Lalgola) (proper noun) মুর্শিদাবাদ (murʃid̪abad̪) (Murshidabad) (proper noun) পশ্চিমবঙ্গ (poʃcimbɔŋgo) (West Bengal) (proper noun)
8

হাওড়া জেলা হাসপাতাল (ha͡o̯ɽa ɟela haʃpat̪al)

(Howrah District Hospital)

আমতা (amot̪a) (Amta) (proper noun) হাওড়া (ha͡o̯ɽa) (Howrah) (proper noun) পশ্চিমবঙ্গ (poʃcimbɔŋgo) (West Bengal) (proper noun)

Structure of doctor table.

id_doc name_doc qualification_doc specialist_doc id_hos id_dept
1070 সুজন দাশগুপ্ত (ʃuɟon d̪aʃgupt̪o) (Sujon Dasgupta) এম.বি.বি.এস. (emo.bi.bi.es.) (M.B.B.S.) নেফ্রোলজিষ্ট (nepʰrolɟiʃto) (nephrologist) 8 80
1080 সোমেন দে (ʃomen d̪e) (Somen De) ডি.এম. (di.em.) (D.M.) নিউরোলজিস্ট (ni͡u̯rolɟist) (neurologist) 9 90

Execution time with respect to number of tokens.

Query Number of tokens Execution time units
Q1 4 4
Q2 5 5
Q3 7 7
Q4 9 9
Q5 12 12

Performance statistics of the proposed system.

Precision Recall Accuracy F1 score
79% 72% 73% 75%
eISSN:
1178-5608
Idioma:
Inglés
Calendario de la edición:
Volume Open
Temas de la revista:
Engineering, Introductions and Overviews, other