diff --git a/NLP/Reuseable_embeddings.ipynb b/NLP/Reuseable_embeddings.ipynb
new file mode 100644
index 0000000..55dfb46
--- /dev/null
+++ b/NLP/Reuseable_embeddings.ipynb
@@ -0,0 +1,454 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "gpuType": "T4",
+ "authorship_tag": "ABX9TyPnNmisyRFTsY4axXqU9vCR",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ },
+ "accelerator": "GPU"
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Reusable Embeddings:\n",
+ "Embeddings in NLP are are very important concept and using them in your model also makes your model out perform. So what are embeddings. These are actually numeric representation of text because our model can not digest text directly it requires numeric data. We also use other techniques like \"one hot encoding\",\n",
+ "\"TF-IDF\" etc. But embeddings carry semantic meanings which helps model to generalize well.\n",
+ "\n",
+ "But what if I tell you that you can give text directly to your model instead of numbers wait for section 2 ✈"
+ ],
+ "metadata": {
+ "id": "taLaqCYFOCxn"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Contents:\n",
+ "\n",
+ "First I will explain embeddings and their formation in some traditional way if you are familiar with embeddings go to section 2 or if you do not wnat to learn this and directly want to learn easy method:\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "* Packages\n",
+ "* 1:Traditional way\n",
+ "* 2:No numbers only text\n",
+ "* Conclusion\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "8ZvV24uigEqg"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Packages\n",
+ "Let's import the packages Here in this tutorial I will teach you a beginner method of using embeddings."
+ ],
+ "metadata": {
+ "id": "IVuQUO3EPRr3"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "id": "Yk9uU0DyuMmP"
+ },
+ "outputs": [],
+ "source": [
+ "import tensorflow as tf,keras\n",
+ "import tensorflow_hub as hub\n",
+ "from tensorflow.keras.models import Sequential\n",
+ "from tensorflow.keras.layers import Dense\n",
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 1:Traditional way\n",
+ "First I will teach you the basic concept of embeddings so you have better idea. To form embeddings we have to first do this:\n",
+ "\n",
+ "**Tokenize:**\n",
+ "\n",
+ "Tokenization means we break our sentence into small pieces like character wiese or word wise.\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Like if we have a sentence \"My name is Aurthor\". So we tokanize it as:\n",
+ "\n",
+ "My , name , is, Aurthor.\n",
+ "\n",
+ "**Integenr Encoding:**\n",
+ "\n",
+ "Then we give each word a number this number is given based on it's position in the vocabulary. so the list for this sentence will be suppose:\n",
+ "\n",
+ "[9,1,3,7]\n",
+ "\n",
+ "**Padding sequence:**\n",
+ "\n",
+ "Now we will padd the sequences means we will make the length of each tokenized sentence which is integer encoded equal because we have to give it to our model. Than we will train it using Neural network.\n",
+ "\n",
+ "**Keras Embedding Layer:**\n",
+ "\n",
+ "We can make task specific embeddings using keras built=in embedding layer which takes these arguments:\n",
+ "\n",
+ "\n",
+ "1. Number of unique words in vocubalory i.e vocuablary size\n",
+ "2. Length of sequence\n",
+ "\n",
+ "1. Number of units in which you want to represent a word.\n",
+ "\n",
+ "I am not going in extra detail of embedding we will discuss it in other tutorial here we have to teach easy way of using Reusable embeddings. But i will share the architecture.\n",
+ "Suppose we have 17 vocuablary size and we want to represent each word with two units than our model will conver each words interger encoding into one-hot vector of dimension 17 and fed into model in this way we will get a embedding of our sequence:\n",
+ "\n",
+ "[ [x1,x2],\n",
+ " [x3,x4],\n",
+ " [x5,x6],\n",
+ " [x7,x8] ]\n",
+ "\n",
+ "So each word is represented by an embedding of size 2 and this is our embedding matrix.\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "AOxhN1saQ0DN"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 2:No numbers only Text:\n",
+ "\n",
+ "Tensorflow hub is a library provided by tensorflow which has large variety of pretrained models. Embeddings itself are kind of transfer learning. With keras and tensorflow-hub using pretrained embeddings is now piece of cake we can do it in only 2 lines of code.\n",
+ "\n",
+ "\n",
+ "\n",
+ "1. One important advantage of using reusable embeddings is that we did not have to worry about preprocessing of our model means we can directly give raw text to our model. The embedding model will take care of preprocessing.\n",
+ "2. Second advantage is this reusable embeddings are trained on a vary large corpus. So if we give our model a word which it has never seen in it's training process it can generalizez well it is because that embeddings carry semantic meanings which makes the model to generalize well overall.\n",
+ "\n",
+ "**Process:**\n",
+ "\n",
+ "To use reusable embeddings we first get the link of model from tensorflow hub and store it in a model. The reusable embeddings we are using here is \"universal sentence encoder\". We store it in a variable.\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "71MHuB4ra-5C"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "hub_url = \"https://tfhub.dev/google/universal-sentence-encoder/4\""
+ ],
+ "metadata": {
+ "id": "Ju9Z2yQ-uX3v"
+ },
+ "execution_count": 1,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Making Embedding Layer:\n",
+ "\n",
+ "Next we have a built-in class in tensorflow-hub known as \"KerasLayer\". This will take the following arguments:\n",
+ "\n",
+ "\n",
+ "\n",
+ "1. Link of model\n",
+ "2. Input shape: In our case we don't have features we have text so input shape will be empty list.\n",
+ "1. Data type: In our case it will be string as we are giving text\n",
+ "2. Last one is trainable parameter depends upon you if you want to fine tune the model.\n",
+ "\n",
+ "That's it our embedding layer is ready we can now use it as it is in our model throught Sequential or Functional API.\n",
+ "Check the syntax:\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "yRliMLv8dMjN"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "embedding_layer = hub.KerasLayer(\n",
+ " hub_url,\n",
+ " input_shape=[],\n",
+ " dtype=tf.string,\n",
+ " trainable=False\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "gvWqBG-iUgEz"
+ },
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "embeddings = embedding_layer([\"How are you\"])"
+ ],
+ "metadata": {
+ "id": "TexbX_DAubvY"
+ },
+ "execution_count": 5,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Explanation:\n",
+ "\n",
+ "The pretrained model will output a 512 dimension embedding vector which is fed to our model this means we do not have to take care of preprocessing this means we can make embeddings in just two lines of code:\n",
+ "\n",
+ " hub_url = \"https://tfhub.dev/google/universal-sentence-encoder/4\"\n",
+ "\n",
+ " embedding_layer = hub.KerasLayer(\n",
+ " hub_url,\n",
+ " input_shape=[],\n",
+ " dtype=tf.string,\n",
+ " trainable=False\n",
+ " )"
+ ],
+ "metadata": {
+ "id": "7XQEWzPve17m"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "print(embeddings.shape)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "AFKxs_RKunFp",
+ "outputId": "9c355886-1090-4919-998d-7adaffba42b6"
+ },
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "(1, 512)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Model:\n",
+ "\n",
+ "I have trained a model for sentiment analysis on a small dataset to check performance"
+ ],
+ "metadata": {
+ "id": "fEi-rCG6hW_i"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "data=pd.read_csv('/content/TF-HUB.xlsx')\n",
+ "X_train=data['Titles']\n",
+ "y_train=data['Labels']"
+ ],
+ "metadata": {
+ "id": "tWwyEgQI2_b3"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "model=Sequential([\n",
+ " embedding_layer,\n",
+ " tf.keras.layers.Reshape((1, 512)),\n",
+ " tf.keras.layers.SimpleRNN(128,return_sequences=True,activation='tanh'),\n",
+ " tf.keras.layers.SimpleRNN(64,activation='relu'),\n",
+ " tf.keras.layers.Dense(1,activation='sigmoid')\n",
+ "],name='TF-HUB')"
+ ],
+ "metadata": {
+ "id": "_D83c8l6upUh"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ " model.compile(optimizer='adam',loss= tf.keras.losses.BinaryCrossentropy(),metrics=['accuracy'])"
+ ],
+ "metadata": {
+ "id": "diDIPx5G0rE_"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "model.fit(X_train,y_train,epochs=10)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "sRdYPRTc1X2c",
+ "outputId": "30e348fc-22de-47e6-c152-34fc5a624265"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Epoch 1/10\n",
+ "1/1 [==============================] - 5s 5s/step - loss: 0.6899 - accuracy: 0.4500\n",
+ "Epoch 2/10\n",
+ "1/1 [==============================] - 0s 21ms/step - loss: 0.6581 - accuracy: 0.9000\n",
+ "Epoch 3/10\n",
+ "1/1 [==============================] - 0s 19ms/step - loss: 0.6302 - accuracy: 0.9500\n",
+ "Epoch 4/10\n",
+ "1/1 [==============================] - 0s 19ms/step - loss: 0.6039 - accuracy: 0.9500\n",
+ "Epoch 5/10\n",
+ "1/1 [==============================] - 0s 18ms/step - loss: 0.5779 - accuracy: 0.9500\n",
+ "Epoch 6/10\n",
+ "1/1 [==============================] - 0s 19ms/step - loss: 0.5522 - accuracy: 0.9500\n",
+ "Epoch 7/10\n",
+ "1/1 [==============================] - 0s 18ms/step - loss: 0.5259 - accuracy: 0.9500\n",
+ "Epoch 8/10\n",
+ "1/1 [==============================] - 0s 17ms/step - loss: 0.4990 - accuracy: 0.9500\n",
+ "Epoch 9/10\n",
+ "1/1 [==============================] - 0s 19ms/step - loss: 0.4716 - accuracy: 0.9500\n",
+ "Epoch 10/10\n",
+ "1/1 [==============================] - 0s 17ms/step - loss: 0.4443 - accuracy: 0.9500\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "pred=model.predict(['This model lacks good results and good architecture'])\n",
+ "pred"
+ ],
+ "metadata": {
+ "id": "OHkMiwh2330W",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c16885d1-0e78-46a1-f57d-1ff667ea3953"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "1/1 [==============================] - 1s 884ms/step\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([[0.46520936]], dtype=float32)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "if pred>0.5:\n",
+ " print(\"+ve\")\n",
+ "else:\n",
+ " print(\"-ve\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "hjZPBp5LHact",
+ "outputId": "8b41647c-0cb7-4fdb-a7e5-99413bba0d14"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "+ve\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Conclusion:\n",
+ "\n",
+ "We can clearly see our model is generalized well on the words that it has never seen in training set and we have made embeddings very easily and do not use any large packages."
+ ],
+ "metadata": {
+ "id": "8iabBi-jfpwP"
+ }
+ }
+ ]
+}
\ No newline at end of file