Senior Software Engineer at Datia and Double Master in Data Science from the Technical University of Einhoven and the University of Aalto.

Awards

  • 2019 F8 Facebook Hackathon Finalist Selected as Top 8 Finalist out of 55 teams in the annual Facebook F8 Hackathon in San Jose, California. Pitched to Mark Zuckerberg in person.

Finalists pitch
OpenCurriculum Project

Portfolio

Music Embedding Clustering

Music Embedding Clustering Using a Pretrained Speaker Verification Model. Can a model trained for speaker verification separate songs from different bands?


Speech Emotion Recognition SE&R 2022

This task aims to motivate research for SER in our community, mainly to discuss theoretical and practical aspects of SER, pre-processing and feature extraction, and machine learning models for Portuguese.

Research

Dropping Incomplete Records is (not so) Straightforward

Published in International Symposium on Intelligent Data Analysis, 2023

A straightforward approach to handling missing values is dropping incomplete records from the dataset. However, for many forms of missingness, this method is known to affect the center and spread of the data distribution. In this paper, we perform an extensive empirical evaluation of the effect of the drop method on the data distribution. In particular, we analyze two scenarios that are likely to occur in practice but are not often considered in simulation studies: 1) when features are skewed rather than symmetrically distributed and 2) when multiple forms of missingness occur simultaneously in one feature. Furthermore, we investigate implications of the drop method for classification accuracy and demonstrate that dropping incomplete records is doubtful, even when test cases are dropped as well.

CoDePPI - Context-Dependent Probabilistic Prior Information Strategy for MRI Reconstruction

Published in Universidade de Brasília Software Engineering Bachelor Thesis, 2021

Context-Dependent Probabilistic Prior Information (CoDePPI), is a better prior information extraction algorithm for Magnetic Resonance Imaging (MRI) reconstructions with the use of the Compressed Sensing (CS) theory. Our method CoDePPI takes advantage of motion information across frames in a dynamic MRI to weigh the confidence that the extracted positions are effectively part of a support structure, that is, reducing the noise introduced by applying prior information.

Document classification using a Bi-LSTM to unclog Brazil’s supreme court

Published in Machine Learning for Developing World: Achieving Sustainable Impact Workshop Proceedings NIPS, 2018

The Brazilian court system is currently the most clogged up judiciary system in the world. Thousands of lawsuit cases reach the supreme court every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as raster scanned documents with widely variable levels of quality. One of the first steps for the analysis is to classify these documents. In this paper we present a Bidirectional Long Short-Term Memory network (Bi-LSTM) to classify these pieces of legal document.

Download here

Document type classification for Brazil’s supreme court using a Convolutional Neural Network

Published in The Tenth International Conference on Forensic Computer Science and Cyber Law, 2018

The Brazilian Court System is currently the biggest judiciary system in the world, and receives an extremely high number of lawsuit cases every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as single PDF files containing multiple documents. One of the first steps for the analysis is to classify these documents. In this paper we present results on identifying these pieces of document using a simple convolutional neural network.

A Boosted Review for Decision Tree Boosting

Published in Unofficial Publication, 2018

Shallow decision trees are weak classifiers that can be combined to create a robust predictive model. Ensemble methods have benefits over a simple decision tree like reducing bias, over-fitting and accuracy improvement. This study has the purpose to study the state-of-art of boosting trees techniques and its applications and analyse qualitatively what has been published on boosting methods. We have searched on IEEE, Scopus, ACM and Elsevier repositories with a string derived from technique PICO. We found a total of 102 papers, and after applying our criteria we got 47 papers. We summarized the methods of gradient boosting decision trees for classification and regression problems. We analysed the algorithms XGBoost, LightGBM, CatBoost, LambdaMART in different published scenarios. We have classified the found papers in the subcategories: business, new method, civil engineering, network security, health, model improvement. We conclude that boosted tree-based algorithms is a field of research in exploration, and development of new techniques.

Contact

Email is the best contact to reach out to me. If you do not get a reply within two days, please do not hesitate to follow-up. Thank you very much for your interest.

  • Social media accounts can be found on the left (for PC) or “Follow” button on the top (for smartphone).