Team Members: Dr.P.Subashini, CMLI Coordinator,Professor of Computer Science.
Dr.S.Meenakshi, Associate Professor of Computer Science, Gobi Arts and Science College.
Dr.R.Janani,Research Assistant,CMLI.
Ms.Sathya Sree S, II MCA , Gobi Arts and Science College.
Project Summary:
The project, entitled “Prediction of Toxicity in Drug Discovery using Quantum Machine Learning,” aims to improve the identification of potentially toxic compounds by combining quantum computing with advanced machine learning techniques. The four important steps that are carried out in this project are: input data, feature extraction and selection, quantum encoding, and model training and evaluation. Input data: The raw dataset contains 1800 drugs represented as SMILES strings. The target variable indicates whether a drug exhibits toxicity (1) or not (0). Feature Extraction and Selection: RDKit is used to extract molecular descriptors from the smiles. These descriptors serve as features that represent the chemical properties of the compounds. After extracting the features, the most relevant ones are selected, focusing on the top 10 features with the highest correlation to toxicity prediction. This helps reduce the dimensionality of the data while maintaining its predictive power. Quantum Encoding: This step involves converting the classical features (molecular descriptors) into quantum states using quantum gates and circuits. Quantum encoding techniques allow the representation of classical data in a quantum system, enabling the use of quantum algorithms to process the information. Once the data is encoded into quantum states, the fidelity of each quantum state is calculated for each quantum state representing the classical data. Based on fidelity values, a fidelity-based quantum kernel is constructed, which can be used in quantum machine learning models. Model Training and Evaluation: The fidelity-based quantum kernel is applied to a Support Vector Machine, which is trained on the dataset. The performance of the QSVM is compared with classical machine learning models using metrics such as Accuracy, F1 Score, Recall, Precision, ROC curve, and Confusion Matrix.