dc.contributor.advisor	Menacho Chiok, César Higinio
dc.contributor.author	Ancajima Bohórquez, Edgar Fernando
dc.date.accessioned	2023-03-17T15:29:40Z
dc.date.available	2023-03-17T15:29:40Z
dc.date.issued	2022
dc.description	Universidad Nacional Agraria La Molina. Facultad de Economía y Planificación. Departamento Académico de Estadística e Informática
dc.description.abstract	Las técnicas de minería de datos (TMD) usadas para el aprendizaje supervisado, generalmente deben considerar un gran número de atributos en las bases de datos a ser analizadas, y muchos de estos atributos son irrelevantes y redundantes que pueden distorsionar el rendimiento y la funcionalidad de estas técnicas, y por lo tanto su capacidad predictiva. Las investigaciones sobre el tema de la selección de atributos, mencionan que, al seleccionar un número menor de atributos del conjunto total, puede traer una serie de ventajas: reducir la redundancia, eliminar el ruido, maximizar la relevancia de los atributos, disminuir costo computacional, aumentar la interpretación y mejorar la precisión del clasificador de aprendizaje supervisado. El objetivo es presentar los métodos de selección de atributos por filtrado y Wrapper que pueden ser aplicadas en las técnicas de minería de datos supervisadas para la tarea de clasificación, consiguiendo los mejores subconjuntos de atributos relevantes con las mayores tasas de precisión. Se aplican cuatro métricas para seleccionar los atributos por filtrado (Chi-Cuadrado, Ganancia de información, Razón de ganancia y Relief) y cuatro métodos por Wrapper (Best-First, Greedy forward, Greedy backward y Hill climbing) en la Encuesta Nacional de Satisfacción de Usuarios de Salud–2015. Los resultados aplicando cuatro TMD a cada uno de los diferentes subconjuntos de atributos seleccionados con los métodos de por filtrado y wrapper, mostraron con las mayores capacidades predictivas para predecir la satisfacción de los usuarios de la atención recibida de los servicios de salud, en el caso de la regresión logística binaria el método wrapper Best-First con 5 atributos y una precisión del 88,7%, el árbol de clasificación C5.0 con wrapper Greedy forward con 6 atributos y una precisión del 89,1%, la redes bayesianas Naive con wrapper Greedy backward con 16 atributos y una precisión del 88,3% y el multiclasificador random Forest con wrapper Greedy backard con 16 atributos y una precisión del 93,0%. Los mayores AUC para la regresión logística binaria fue con el método Greedy forward con 0,932, el árbol de clasificación C5.0 con Greedy forward con 0,891, la rede bayesianas Naive con wrapper Greedy forward con 0,9221 y el multiclasificador random Forest con Greedy backard con 0,941.
dc.description.abstract	Data mining techniques (DMT) used for supervised learning generally must consider the large number of attributes in the databases to be analyzed, and many of these attributes are irrelevant and redundant that can distort performance and functionality of these techniques, and therefore their predictive capacity. Research on the subject of feature selection mentions that by selecting a smaller number of features from the total set, it can bring a series of advantages: reduce redundancy, eliminate noise, maximize the relevance of features, reduce computational cost, increase the interpretation and improve the accuracy of the supervised learning classifier. The objective is to present the filtering and Wrapper attribute selection methods that can be applied in supervised data mining techniques for the classification task, obtaining the best subsets of relevant attributes with the highest accuracy rates. Four metrics are applied to select the attributes by filtering (Chi-Square, Information gain, Gain ratio and Relief) and four methods by Wrapper (Best-First, Greedy forward, Greedy backward and Hill climbing) in the National Satisfaction Survey of Health Users–2015. The results applying four TMD to each of the different subsets of attributes selected with the methods of filtering and wrapper, showed the greatest predictive capacities to predict the satisfaction of the users of the attention received from the health services, in the case of binary logistic regression, the Best-First wrapper method with 5 attributes and an accuracy of 88,7%, the C5.0 classification tree with Greedy forward wrapper with 6 attributes and an accuracy of 89,1%, the Naive Bayesian network with Greedy backward wrapper with 16 attributes and an accuracy of 88,3% and the random Forest multiclassifier with Greedy backard wrapper with 16 attributes and an accuracy of 93,0%. The highest AUC for binary logistic regression was with the Greedy forward method with 0,932, the C5.0 classification tree with Greedy forward with 0,891, the Naive Bayesian network with Greedy forward wrapper with 0,9221 and the random Forest multiclassifier with Greedy backard. with 0,941.
dc.format	application/pdf
dc.identifier.uri	https://hdl.handle.net/20.500.12996/5703
dc.language.iso	spa
dc.publisher	Universidad Nacional Agraria La Molina
dc.publisher.country	PE
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	métodos de filtrado
dc.subject.ocde	https://purl.org/pe-repo/ocde/ford#4.05.00
dc.title	Selección de atributos por métodos de filtrado y wrapper para predecir la satisfacción de usuarios de salud
dc.type	http://purl.org/coar/resource_type/c_7a1f
dc.type.version	http://purl.org/coar/version/c_970fb48d4fbd8a85
renati.advisor.dni	07108718
renati.advisor.orcid	https://orcid.org/0000-0003-1310-2551
renati.author.dni	45095702
renati.discipline	542026
renati.juror	Miranda Villagómez, Clodomiro Fernando
renati.juror	Coaquira Nina, Frida Rosa
renati.juror	Vargas Paredes, Ana Cecilia
renati.level	https://purl.org/pe-repo/renati/level#tituloProfesional
renati.type	https://purl.org/pe-repo/renati/type#tesis
thesis.degree.discipline	Estadística e Informática
thesis.degree.grantor	Universidad Nacional Agraria La Molina. Facultad de Economía y Planificación
thesis.degree.name	Ingeniero Estadístico Informático

Selección de atributos por métodos de filtrado y wrapper para predecir la satisfacción de usuarios de salud

Files

Original bundle

License bundle

Collections