Feature engineering is the process of selecting the features used to train your ML models. Your base data must be turned into features and attributes to be useful in predictive modeling. Proper feature selection drives the ultimate performance of your modelβs predictive accuracy, resilience and output scale. The process of building features to train and operate your ML models can take 70-80% of overall time and effort. EDA is one of the most challenging and important steps in this process. If you donβt have a deep understanding of your data, you can waste resources creating sub-optimal features that ultimately are the biggest culprits causing low-quality models.
This session will dig into the key activities in EDA and discuss new ideas and tools for how to improve the process and outcomes. We will also hear from industry leader, SymphonyAI, a global enterprise AI solutions company, on their strategy around EDA and Auto Feature generation.
Attendees will learn about the importance of performing an extensive EDA for a meaningful feature creation. We will explain in detail ways to perform an extensive data assessment by looking at the following:
π Descriptive Statistics
π Measures of Counts (fill rate, missing count, nonzero count)
π Measures of Central Tendency (mean, median, mode, mode rows, mode percentage)
π Measures of Cardinality (unique value count & IDness)
π Measures of Percentiles
π Measures of Dispersion (variance, std-dev, CoV, IQR, Range)
π Measures of Shape
π Data Quality Check
π Null Detection, IDness Detection, Biasedness detection, Invalid entries detection, Outlier detection, etc.
π Attribute Associations
π Correlation Analysis, Information Value & Information Gain analysis, and Variable Clustering
π Data Drift and Data stability Analysis