mushroom_edibility_classifier

Sleeping

App Files Files Community

inigosarralde commited on Feb 6, 2022

Commit

c4db958

•

1 Parent(s): 6ec1dec

Carga inicial de archivos

Browse files

Files changed (21) hide show

01.TFM_Web_Scraping.ipynb +658 -0
02.TFM_CNN_Model.ipynb +0 -0
03.TFM_Web_App.ipynb +325 -0
Images.rar +3 -0
app.py +125 -0
app_interface/Agaricus augustus 2 wf.jpg +0 -0
app_interface/Amanita muscaria 1 wf.jpg +0 -0
app_interface/Amanita pantherina 11 wf.jpg +0 -0
app_interface/Amanita torrendii 8 fp.jpg +0 -0
app_interface/Amanitaphalloides1 mw.jpg +0 -0
app_interface/Boletus edulis 15 wf.jpg +0 -0
app_interface/Caloceraviscosa1 mw.jpg +0 -0
app_interface/Cantharelluscibarius5 mw.jpg +0 -0
app_interface/Clavulinopsis fusiformis 2 fp.jpg +0 -0
app_interface/Coprinellus micaceus 8 wf.jpg +0 -0
app_interface/Lactarius torminosus 6 fp.jpg +0 -0
app_interface/Russula sanguinea 5 fp.jpg +0 -0
app_interface/confusion_matrix.png +0 -0
app_interface/thumbnail.png +0 -0
model.h5 +3 -0
requirements.txt +5 -0

01.TFM_Web_Scraping.ipynb ADDED Viewed

	@@ -0,0 +1,658 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "toc": true
+   },
+   "source": [
+    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
+    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Web-Scraping\" data-toc-modified-id=\"Web-Scraping-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Web Scraping</a></span><ul class=\"toc-item\"><li><span><a href=\"#Importación-de-librerías\" data-toc-modified-id=\"Importación-de-librerías-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Importación de librerías</a></span></li><li><span><a href=\"#Definición-de-funciones\" data-toc-modified-id=\"Definición-de-funciones-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Definición de funciones</a></span><ul class=\"toc-item\"><li><span><a href=\"#HTML-parser\" data-toc-modified-id=\"HTML-parser-1.2.1\"><span class=\"toc-item-num\">1.2.1&nbsp;&nbsp;</span>HTML parser</a></span></li><li><span><a href=\"#Directory-switcher\" data-toc-modified-id=\"Directory-switcher-1.2.2\"><span class=\"toc-item-num\">1.2.2&nbsp;&nbsp;</span>Directory switcher</a></span></li><li><span><a href=\"#Image-downloader\" data-toc-modified-id=\"Image-downloader-1.2.3\"><span class=\"toc-item-num\">1.2.3&nbsp;&nbsp;</span>Image downloader</a></span></li><li><span><a href=\"#Pager\" data-toc-modified-id=\"Pager-1.2.4\"><span class=\"toc-item-num\">1.2.4&nbsp;&nbsp;</span>Pager</a></span></li></ul></li><li><span><a href=\"#Obtención-de-imágenes\" data-toc-modified-id=\"Obtención-de-imágenes-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Obtención de imágenes</a></span><ul class=\"toc-item\"><li><span><a href=\"#Mushroom-World\" data-toc-modified-id=\"Mushroom-World-1.3.1\"><span class=\"toc-item-num\">1.3.1&nbsp;&nbsp;</span>Mushroom World</a></span></li><li><span><a href=\"#Wild-UK-Mushrooms\" data-toc-modified-id=\"Wild-UK-Mushrooms-1.3.2\"><span class=\"toc-item-num\">1.3.2&nbsp;&nbsp;</span>Wild UK Mushrooms</a></span></li><li><span><a href=\"#Fungipedia\" data-toc-modified-id=\"Fungipedia-1.3.3\"><span class=\"toc-item-num\">1.3.3&nbsp;&nbsp;</span>Fungipedia</a></span></li></ul></li></ul></li></ul></div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Web Scraping"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "La primera parte del proyecto consiste en la obtención, mediantes técnicas de Web Scraping, de:\n",
+    "\n",
+    "*   Un set de imágenes de setas para entrenar un modelo Deep Learning  de clasificación multiclase.\n",
+    "*   Un dataset con los datos asociados a distintas especies de setas, para entrenar un modelo Machine Learning de clasificación multiclase. \n",
+    "\n",
+    "Toda la información será obtenida de las siguientes páginas web:\n",
+    "\n",
+    "\n",
+    "*   https://www.mushroom.world/\n",
+    "*   https://www.wildfooduk.com/mushroom-guide/\n",
+    "*   https://www.fungipedia.org/hongos/\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Importación de librerías "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.241675Z",
+     "start_time": "2022-02-05T11:38:50.637941Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Importamos librerías\n",
+    "import requests                      # Librería HTTP\n",
+    "from bs4 import BeautifulSoup        # Extraer datos de archivos HTML y XML\n",
+    "import re                            # Regular Expressions\n",
+    "import os                            # Paths y directorios\n",
+    "import pathlib                       # Paths y directorios\n",
+    "import pandas as pd                  # Tratamiento de DataFrames\n",
+    "import numpy as np                   # Funciones matemáticas, algebraicas y otras\n",
+    "from PIL import Image                # Edición de imágenes\n",
+    "from resizeimage import resizeimage  # Reescalado de imágenes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Definición de funciones"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Para este ejercicio, definiremos una serie de funciones que nos facilitarán la ejecución y la estructura del mismo:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### HTML parser\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Definimos en primer lugar una función general para ***parsear* la información de la URL facilitada**: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.257201Z",
+     "start_time": "2022-02-05T11:38:51.242692Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def getdata(url, parser='html.parser'):\n",
+    "    #Definimos los headers para la request HTTP de manera que el servidor no nos bloquee la respuesta:\n",
+    "    headers = {\n",
+    "    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',\n",
+    "    'Accept-Language': 'es-ES,es;q=0.9', \n",
+    "    'Cache-Control': 'max-age=0',\n",
+    "    'Referer': 'https://google.com',\n",
+    "    'DNT': '1',\n",
+    "    }\n",
+    "    dir = url\n",
+    "    r = requests.get(dir, headers = headers)\n",
+    "    soup = BeautifulSoup(r.text, parser)\n",
+    "    return soup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "###  Directory switcher"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Creamos también una función para crear y cambiar directorios en la carpeta raíz para nuestras imágenes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.273252Z",
+     "start_time": "2022-02-05T11:38:51.258172Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "directory = os.getcwd()\n",
+    "image_directory = os.path.join(directory + '\\Images')\n",
+    "try:\n",
+    "    os.mkdir(image_directory)\n",
+    "except FileExistsError:\n",
+    "        pass"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.289268Z",
+     "start_time": "2022-02-05T11:38:51.274220Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def getdirectory(folder):\n",
+    "    os.chdir(image_directory)\n",
+    "    try:\n",
+    "        os.mkdir(os.getcwd() + \"/\" + str(folder))\n",
+    "    except FileExistsError:\n",
+    "        pass\n",
+    "    os.chdir(os.getcwd() + \"/\" + str(folder))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Image downloader"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "La siguientes funciones sirven para **descargar todas las imágenes de las distintas webs** anteriormente mencionadas:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-01-21T19:56:14.547984Z",
+     "start_time": "2022-01-21T19:56:14.529008Z"
+    }
+   },
+   "source": [
+    "1. La **primera función de descarga** es para https://www.mushroom.world/.\n",
+    "\n",
+    "    En ella, usamos las *RE* para obtener los atributos \"href\" de cada imagen y obtener sus URLs. En este caso, todas son de la forma: *https://www.mushroom.world/data/...*. Después, iteraremos sobre este listado y escribiremos cada imagen en el directorio."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.304866Z",
+     "start_time": "2022-02-05T11:38:51.290267Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def imagedown_1(soup):\n",
+    "    images = soup.find_all(href=re.compile(\"data\"))\n",
+    "    for image in images:\n",
+    "        href = image['href']\n",
+    "        link = 'https://www.mushroom.world' + image['href'][3:]\n",
+    "        name = href[15:-4] + ' mw' + '.jpg'\n",
+    "        name = name.replace('/','-')\n",
+    "        if not os.path.exists('./' + name): # Solo descargamos aquellas imágenes que no tengamos\n",
+    "            with open(name, 'wb') as f:\n",
+    "                im = requests.get(link)\n",
+    "                f.write(im.content)\n",
+    "            # Reescalamos la imagen a 512x512 píxeles\n",
+    "            with open(name, 'r+b') as f:\n",
+    "                try:\n",
+    "                    with Image.open(f) as image:\n",
+    "                        cover = resizeimage.resize_cover(image, [512, 512])\n",
+    "                        cover.save(name, image.format)\n",
+    "                except:\n",
+    "                    pass"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "2. Creamos ahora una **función para aplicar en la segunda web** utilizada : https://www.wildfooduk.com/mushroom-guide/\n",
+    "\n",
+    "    En este caso, hacemos uso de las utilidades de BS4 y html para obtener las imágenes de cada página. Eso si, previamente necesitaremos un listado con todos los links individuales."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.319987Z",
+     "start_time": "2022-02-05T11:38:51.305866Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def imagedown_2(soup):\n",
+    "    image_set = soup.find('ul', {'class': 'mush-thumbs'})\n",
+    "    images = image_set.find_all('a', {'id': re.compile(\"image-\")})\n",
+    "    contador = 0\n",
+    "    for image in images:\n",
+    "        contador += 1\n",
+    "        link = image['href']\n",
+    "        name = soup.find('table').find_all('td')[5].string.strip() + \" \" + str(contador) + ' wf' + '.jpg'\n",
+    "        name = name.replace('/','-')\n",
+    "        if not os.path.exists('./' + name): # Solo descargamos aquellas imágenes que no tengamos\n",
+    "            with open(name, 'wb') as f:\n",
+    "                im = requests.get(link)\n",
+    "                f.write(im.content)\n",
+    "            # Reescalamos la imagen a 512x512 píxeles\n",
+    "            with open(name, 'r+b') as f:\n",
+    "                try:\n",
+    "                    with Image.open(f) as image:\n",
+    "                        cover = resizeimage.resize_cover(image, [512, 512])\n",
+    "                        cover.save(name, image.format)\n",
+    "                except:\n",
+    "                    pass\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "3. Creamos ahora una función para aplicar en la **tercera y última web utilizada**: https://www.fungipedia.org/hongos.html\n",
+    "\n",
+    "    En este caso utilizaremos CSS Selectors para obtener las imágenes, pues las imágenes se encuentran dentro de un plugin \"Simple Image Gallery Pro\". Al igual que en caso anterior, previamente necesitaremos un listado con todos los links individuales:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.335814Z",
+     "start_time": "2022-02-05T11:38:51.320673Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def imagedown_3(soup):\n",
+    "    # Utilizamos nuevamente headers, de lo contrario el servidor nos devuelve un codigo 403 Forbidden error.\n",
+    "    headers = {\n",
+    "        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',\n",
+    "        'Accept-Language': 'es-ES,es;q=0.9', \n",
+    "        'Cache-Control': 'max-age=0',\n",
+    "        'Referer': 'https://google.com',\n",
+    "        'DNT': '1',\n",
+    "    }\n",
+    "    images = soup.select('.sigProLinkWrapper a[href]:not([href=\"\"])')\n",
+    "    domain = 'https://www.fungipedia.org'\n",
+    "    contador = 0\n",
+    "    for image in images:\n",
+    "        contador += 1\n",
+    "        link = image.attrs.get('href')\n",
+    "        name = soup.find('h1', {'class': 'itemTitle'}).string.strip() + \" \" + str(contador) + ' fp' + '.jpg'\n",
+    "        name = name.replace('/','-')\n",
+    "        if not os.path.exists('./' + name): # Solo descargamos aquellas imágenes que no tengamos\n",
+    "            with open(name, 'wb') as f: \n",
+    "                im = requests.get(domain + link, allow_redirects = True, headers = headers)\n",
+    "                f.write(im.content)\n",
+    "            # Reescalamos la imagen a 512x512 píxeles\n",
+    "            with open(name, 'r+b') as f:\n",
+    "                try:\n",
+    "                    with Image.open(f) as image:\n",
+    "                        cover = resizeimage.resize_cover(image, [512, 512])\n",
+    "                        cover.save(name, image.format)\n",
+    "                except:\n",
+    "                    pass"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Pager"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "En último lugar, definimos las funciones para **pasar de página** en las distintas estructuras de las webs:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-01-21T20:00:04.349396Z",
+     "start_time": "2022-01-21T20:00:04.339424Z"
+    }
+   },
+   "source": [
+    "1. Para https://www.mushroom.world:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.351406Z",
+     "start_time": "2022-02-05T11:38:51.336529Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def getnextpage_1(soup):\n",
+    "    page = soup.find('div', {'id': 'pager'})                    # A partir de la etiqueta de división div del paginador (id=pager), podemos pasar de página iterando en un bucle:\n",
+    "    if page.find(string=re.compile(\"Next Page\")):\n",
+    "        url = 'https://www.mushroom.world' +  str(page.find('a', string=re.compile(\"Next Page\"))['href'])\n",
+    "        return url\n",
+    "    else:\n",
+    "        return"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "2. Para https://www.wildfooduk.com/mushroom-guide/: en este caso encontramos todos los links en una única URL, luego no hará falta."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "3. Para https://www.fungipedia.org/':"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.366585Z",
+     "start_time": "2022-02-05T11:38:51.352439Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def getnextpage_2(soup):\n",
+    "    pager = soup.find('div', {'class': 'pagination'})\n",
+    "    if pager.find('a', {'class': 'next'}):\n",
+    "        url = 'https://www.fungipedia.org/' + str(pager.find('a', {'class': 'next'})['href'])\n",
+    "        return url\n",
+    "    else:\n",
+    "        return"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Obtención de imágenes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Mushroom World "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Las distintas URLs del siguiente diccionario hacen referencia a las clases en las que dividiremos las imágenes de las setas:\n",
+    "\n",
+    "*   **Edible** *(Comestibles)*\n",
+    "*   **Inedible** *(No Comestibles)*\n",
+    "*   **Poisonous** *(Venenosas)*\n",
+    "\n",
+    "Cada una de ellas se guardará en una carpeta en nuestro directorio."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:38:51.381814Z",
+     "start_time": "2022-02-05T11:38:51.368582Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "edibility = {\"Edible\"  :  'https://www.mushroom.world/mushrooms/edible', \n",
+    "             \"Inedible\":  'https://www.mushroom.world/mushrooms/inedible', \n",
+    "             \"Poisonous\": 'https://www.mushroom.world/mushrooms/poisonous'}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Con las funciones anteriormente definidas, simplemente planteamos el siguiente bucle para descargar todas las imágenes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:48:33.300282Z",
+     "start_time": "2022-02-05T11:38:51.382815Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "for i in edibility:\n",
+    "    url = edibility[i]\n",
+    "    getdirectory(i)\n",
+    "    while type(url) == str:\n",
+    "        soup  = getdata(url,'html.parser')\n",
+    "        imagedown_1(soup)\n",
+    "        url = getnextpage_1(soup)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Wild UK Mushrooms"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Fijamos las URLs nuevamente:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T11:48:33.316268Z",
+     "start_time": "2022-02-05T11:48:33.301281Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "edibility = {\"Edible\"  :  'https://www.wildfooduk.com/mushroom-guide/?mushroom_type=edible', \n",
+    "             \"Inedible\":  'https://www.wildfooduk.com/mushroom-guide/?mushroom_type=inedible', \n",
+    "             \"Poisonous\": 'https://www.wildfooduk.com/mushroom-guide/?mushroom_type=poisonous'} "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Planteamos el bucle para recorrer la estructura de la web y descargar las imágenes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T12:13:26.101999Z",
+     "start_time": "2022-02-05T11:48:33.317238Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "for i in edibility:\n",
+    "    url = edibility[i]\n",
+    "    getdirectory(i)\n",
+    "    soup  = getdata(url)\n",
+    "    mushroom_table = soup.find_all('td', {'class': 'mushroom-image'})\n",
+    "    mushroom_links = []\n",
+    "    for mushroom in mushroom_table:\n",
+    "        mushroom_links.append(mushroom.find('a')['href'])\n",
+    "    for link in mushroom_links:\n",
+    "        soup  = getdata(link,'html.parser')\n",
+    "        imagedown_2(soup)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Fungipedia "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Fijamos las URLs nuevamente, en este caso al aplicando los filtros correspondientes en la web, estos se reflejan directamente en las URLs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T12:13:26.117978Z",
+     "start_time": "2022-02-05T12:13:26.102991Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "edibility = {\"Edible\"  :  'https://www.fungipedia.org/hongos/itemlist/filter.html?array12%5B%5D=buen-comestible&array12%5B%5D=buen-comestible-precaucion&array12%5B%5D=comestible&array12%5B%5D=comestible-precaucion&array12%5B%5D=excelente-comestible&array12%5B%5D=excelente-comestible-precaucion&moduleId=95&Itemid=337', \n",
+    "             \"Inedible\":  'https://www.fungipedia.org/hongos/itemlist/filter.html?array12%5B%5D=no-comestible&array12%5B%5D=sin-valor&moduleId=95&Itemid=337', \n",
+    "             \"Poisonous\": 'https://www.fungipedia.org/hongos/itemlist/filter.html?array12%5B%5D=mortal&array12%5B%5D=toxica&moduleId=95&Itemid=337'} "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Planteamos el bucle para recorrer la estructura de la web y descargar las imágenes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-05T12:36:46.899216Z",
+     "start_time": "2022-02-05T12:13:26.118948Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "for i in edibility:\n",
+    "    url_main = edibility[i]\n",
+    "    getdirectory(i)\n",
+    "    while True:\n",
+    "        soup_main = getdata(url_main,'html.parser')\n",
+    "        mushroom_elements = soup_main.find_all('a', {'class': 'gris'})\n",
+    "        mushroom_links = []\n",
+    "        for element in mushroom_elements:\n",
+    "            mushroom_links.append('https://www.fungipedia.org' + element['href'])\n",
+    "        for link in mushroom_links:\n",
+    "            soup_link  = getdata(link,'html.parser')\n",
+    "            imagedown_3(soup_link)\n",
+    "        url_main = getnextpage_2(soup_main)\n",
+    "        if not url_main:\n",
+    "            break"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "collapsed_sections": [
+    "eRTY-COfOwwD",
+    "ip1P14xN-uSX",
+    "VaNHXO2N_Hv-",
+    "mhPpAK2bMyQZ",
+    "rEI-mXrkU4ku"
+   ],
+   "name": "PrevioTFM3.ipynb",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  },
+  "toc": {
+   "base_numbering": 1,
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": true,
+   "toc_position": {
+    "height": "877px",
+    "left": "70px",
+    "top": "111.125px",
+    "width": "165px"
+   },
+   "toc_section_display": true,
+   "toc_window_display": true
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}

02.TFM_CNN_Model.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

03.TFM_Web_App.ipynb ADDED Viewed

	@@ -0,0 +1,325 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "toc": true
+   },
+   "source": [
+    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
+    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Implementación-del-modelo-como-una-Web-App\" data-toc-modified-id=\"Implementación-del-modelo-como-una-Web-App-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Implementación del modelo como una Web App</a></span></li></ul></div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Implementación del modelo como una Web App"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-06T22:40:31.231371Z",
+     "start_time": "2022-02-06T22:40:25.856446Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import gradio as gr\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import PIL\n",
+    "import tensorflow as tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-01-24T14:01:13.298437Z",
+     "start_time": "2022-01-24T14:01:13.293451Z"
+    }
+   },
+   "source": [
+    "En primer lugar, creamos la función sobre la que envolveremos la interfaz de Gradio. Para ello, cargamos el modelo:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-06T22:40:32.997510Z",
+     "start_time": "2022-02-06T22:40:31.232369Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "model = tf.keras.models.load_model('model.h5')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-06T22:40:33.013563Z",
+     "start_time": "2022-02-06T22:40:32.998484Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class_name_list = ['Edible', 'Inedible', 'Poisonous']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-06T22:40:33.028866Z",
+     "start_time": "2022-02-06T22:40:33.014537Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def predict_image(img):\n",
+    "    # Reescalamos la imagen en 4 dimensiones\n",
+    "    img_4d = img.reshape(-1,224,224,3)\n",
+    "    # Predicción del modelo\n",
+    "    prediction = model.predict(img_4d)[0]\n",
+    "    # Diccionario con todas las clases y las probabilidades correspondientes\n",
+    "    return {class_name_list[i]: float(prediction[i]) for i in range(3)}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-02-06T22:40:36.372549Z",
+     "start_time": "2022-02-06T22:40:33.029834Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\Usuario\\anaconda3\\envs\\python38gpu\\lib\\site-packages\\gradio\\interface.py:272: UserWarning: 'darkpeach' theme name is deprecated, using dark-peach instead.\n",
+      "  warnings.warn(\n",
+      "C:\\Users\\Usuario\\anaconda3\\envs\\python38gpu\\lib\\site-packages\\gradio\\interface.py:338: UserWarning: The `allow_flagging` parameter in `Interface` nowtakes a string value ('auto', 'manual', or 'never'), not a boolean. Setting parameter to: 'never'.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Running on local URL:  http://127.0.0.1:7860/\n",
+      "\n",
+      "To create a public link, set `share=True` in `launch()`.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "        <iframe\n",
+       "            width=\"900\"\n",
+       "            height=\"500\"\n",
+       "            src=\"http://127.0.0.1:7860/\"\n",
+       "            frameborder=\"0\"\n",
+       "            allowfullscreen\n",
+       "            \n",
+       "        ></iframe>\n",
+       "        "
+      ],
+      "text/plain": [
+       "<IPython.lib.display.IFrame at 0x1875520e790>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "(<fastapi.applications.FastAPI at 0x1873512db50>,\n",
+       " 'http://127.0.0.1:7860/',\n",
+       " None)"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "image = gr.inputs.Image(shape=(224,224))\n",
+    "label = gr.outputs.Label(num_top_classes=3)\n",
+    "title = 'Mushroom Edibility Classifier'\n",
+    "description = 'Get the edibility classification for the input mushroom image'\n",
+    "examples=[['app_interface/Boletus edulis 15 wf.jpg'],\n",
+    "          ['app_interface/Cantharelluscibarius5 mw.jpg'],\n",
+    "          ['app_interface/Agaricus augustus 2 wf.jpg'],\n",
+    "          ['app_interface/Coprinellus micaceus 8 wf.jpg'],\n",
+    "          ['app_interface/Clavulinopsis fusiformis 2 fp.jpg'],\n",
+    "          ['app_interface/Amanita torrendii 8 fp.jpg'],\n",
+    "          ['app_interface/Russula sanguinea 5 fp.jpg'],\n",
+    "          ['app_interface/Caloceraviscosa1 mw.jpg'],\n",
+    "          ['app_interface/Amanita muscaria 1 wf.jpg'],\n",
+    "          ['app_interface/Amanita pantherina 11 wf.jpg'],\n",
+    "          ['app_interface/Lactarius torminosus 6 fp.jpg'],\n",
+    "          ['app_interface/Amanitaphalloides1 mw.jpg']]\n",
+    "thumbnail = 'app_interface/thumbnail.png'\n",
+    "article = '''\n",
+    "<!DOCTYPE html>\n",
+    "<html>\n",
+    "<body>\n",
+    "<p>The Mushroom Edibility Classifier is an MVP for CNN multiclass classification model.<br>\n",
+    "It has been trained after gathering <b>5500 mushroom images</b> through Web Scraping techniques from the following web sites:</p>\n",
+    "<br>\n",
+    "<p>\n",
+    "<a href=\"https://www.mushroom.world/\">- Mushroom World</a><br>\n",
+    "<a href=\"https://www.wildfooduk.com/mushroom-guide/\">- Wild Food UK</a> <br>\n",
+    "<a href=\"https://www.fungipedia.org/hongos\">- Fungipedia</a><\n",
+    "</p>\n",
+    "<br>\n",
+    "<p style=\"color:Orange;\">Note: <i>model created solely and exclusively for academic purposes. The results provided by the model should never be considered definitive as the accuracy of the model is not guaranteed.</i></p>\n",
+    "\n",
+    "<br>\n",
+    "<p><b>MODEL METRICS:</b></p>  \n",
+    "<table>\n",
+    "  <tr>\n",
+    "    <th> </th>\n",
+    "    <th>precision</th>\n",
+    "    <th>recall</th>\n",
+    "    <th>f1-score</th>\n",
+    "    <th>support</th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <th>Edible</th>\n",
+    "    <th>0.61</th>\n",
+    "    <th>0.70</th>\n",
+    "    <th>0.65</th>\n",
+    "    <th>481</th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <th>Inedible</th>\n",
+    "    <th>0.67</th>\n",
+    "    <th>0.69</th>\n",
+    "    <th>0.68</th>\n",
+    "    <th>439</th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <th>Poisonous</th>\n",
+    "    <th>0.52</th>\n",
+    "    <th>0.28</th>\n",
+    "    <th>0.36</th>\n",
+    "    <th>192</th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "  <th></th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <th>Global Accuracy</th>\n",
+    "    <th></th>\n",
+    "    <th></th>\n",
+    "    <th>0.63</th>\n",
+    "    <th>1112</th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <th>Macro Average</th>\n",
+    "    <th>0.60</th>\n",
+    "    <th>0.56</th>\n",
+    "    <th>0.57</th>\n",
+    "    <th>1112</th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <th>Weighted Average</th>\n",
+    "    <th>0.62</th>\n",
+    "    <th>0.63</th>\n",
+    "    <th>0.61</th>\n",
+    "    <th>1112</th>\n",
+    "  </tr>\n",
+    "</table>\n",
+    "<br>\n",
+    "<p><i>Author: Íñigo Sarralde Alzórriz</i></p> \n",
+    "</body>\n",
+    "</html>\n",
+    "'''\n",
+    "\n",
+    "iface = gr.Interface(fn=predict_image, \n",
+    "                     inputs=image, \n",
+    "                     outputs=label,\n",
+    "                     interpretation='default',\n",
+    "                     title = title,\n",
+    "                     description = description,\n",
+    "                     theme = 'darkpeach',\n",
+    "                     examples = examples,\n",
+    "                     thumbnail = thumbnail,\n",
+    "                     article = article,\n",
+    "                     allow_flagging = False,\n",
+    "                     allow_screenshot = False,                     \n",
+    "                    )\n",
+    "iface.launch()"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "collapsed_sections": [
+    "eRTY-COfOwwD",
+    "ip1P14xN-uSX",
+    "VaNHXO2N_Hv-",
+    "mhPpAK2bMyQZ",
+    "rEI-mXrkU4ku"
+   ],
+   "name": "PrevioTFM3.ipynb",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  },
+  "toc": {
+   "base_numbering": 1,
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": true,
+   "toc_position": {
+    "height": "877px",
+    "left": "70px",
+    "top": "111.125px",
+    "width": "316.771px"
+   },
+   "toc_section_display": true,
+   "toc_window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}

Images.rar ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be1b81d1226d506d35e4f0e4f3ef903a797b3b46266af3b2b2f6ab11430c8cf5
+size 277969255

app.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import gradio as gr
+import matplotlib.pyplot as plt
+import numpy as np
+import PIL
+import tensorflow as tf
+model = tf.keras.models.load_model('model.h5')
+class_name_list = ['Edible', 'Inedible', 'Poisonous']
+def predict_image(img):
+    # Reescalamos la imagen en 4 dimensiones
+    img_4d = img.reshape(-1,224,224,3)
+    # Predicción del modelo
+    prediction = model.predict(img_4d)[0]
+    # Diccionario con todas las clases y las probabilidades correspondientes
+    return {class_name_list[i]: float(prediction[i]) for i in range(3)}
+image = gr.inputs.Image(shape=(224,224))
+label = gr.outputs.Label(num_top_classes=3)
+title = 'Mushroom Edibility Classifier'
+description = 'Get the edibility classification for the input mushroom image'
+examples=[['app_interface/Boletus edulis 15 wf.jpg'],
+          ['app_interface/Cantharelluscibarius5 mw.jpg'],
+          ['app_interface/Agaricus augustus 2 wf.jpg'],
+          ['app_interface/Coprinellus micaceus 8 wf.jpg'],
+          ['app_interface/Clavulinopsis fusiformis 2 fp.jpg'],
+          ['app_interface/Amanita torrendii 8 fp.jpg'],
+          ['app_interface/Russula sanguinea 5 fp.jpg'],
+          ['app_interface/Caloceraviscosa1 mw.jpg'],
+          ['app_interface/Amanita muscaria 1 wf.jpg'],
+          ['app_interface/Amanita pantherina 11 wf.jpg'],
+          ['app_interface/Lactarius torminosus 6 fp.jpg'],
+          ['app_interface/Amanitaphalloides1 mw.jpg']]
+thumbnail = 'app_interface/thumbnail.png'
+article = '''
+<!DOCTYPE html>
+<html>
+<body>
+<p>The Mushroom Edibility Classifier is an MVP for CNN multiclass classification model.<br>
+It has been trained after gathering <b>5500 mushroom images</b> through Web Scraping techniques from the following web sites:</p>
+<br>
+<p>
+<a href="https://www.mushroom.world/">- Mushroom World</a><br>
+<a href="https://www.wildfooduk.com/mushroom-guide/">- Wild Food UK</a> <br>
+<a href="https://www.fungipedia.org/hongos">- Fungipedia</a>
+</p>
+<br>
+<p style="color:Orange;">Note: <i>model created solely and exclusively for academic purposes. The results provided by the model should never be considered definitive as the accuracy of the model is not guaranteed.</i></p>
+<br>
+<p><b>MODEL METRICS:</b></p>
+<table>
+  <tr>
+    <th> </th>
+    <th>precision</th>
+    <th>recall</th>
+    <th>f1-score</th>
+    <th>support</th>
+  </tr>
+  <tr>
+    <th>Edible</th>
+    <th>0.61</th>
+    <th>0.70</th>
+    <th>0.65</th>
+    <th>481</th>
+  </tr>
+  <tr>
+    <th>Inedible</th>
+    <th>0.67</th>
+    <th>0.69</th>
+    <th>0.68</th>
+    <th>439</th>
+  </tr>
+  <tr>
+    <th>Poisonous</th>
+    <th>0.52</th>
+    <th>0.28</th>
+    <th>0.36</th>
+    <th>192</th>
+  </tr>
+  <tr>
+  <th></th>
+  </tr>
+  <tr>
+    <th>Global Accuracy</th>
+    <th></th>
+    <th></th>
+    <th>0.63</th>
+    <th>1112</th>
+  </tr>
+  <tr>
+    <th>Macro Average</th>
+    <th>0.60</th>
+    <th>0.56</th>
+    <th>0.57</th>
+    <th>1112</th>
+  </tr>
+  <tr>
+    <th>Weighted Average</th>
+    <th>0.62</th>
+    <th>0.63</th>
+    <th>0.61</th>
+    <th>1112</th>
+  </tr>
+</table>
+<br>
+<p><i>Author: Íñigo Sarralde Alzórriz</i></p>
+</body>
+</html>
+'''
+iface = gr.Interface(fn=predict_image,
+                     inputs=image,
+                     outputs=label,
+                     interpretation='default',
+                     title = title,
+                     description = description,
+                     theme = 'darkpeach',
+                     examples = examples,
+                     thumbnail = thumbnail,
+                     article = article,
+                     allow_flagging = False,
+                     allow_screenshot = False,
+                    )
+iface.launch()