See Summer Code Sprint 2021









The class has ended with the following publications:










Summer Code Sprint 2020 Solar Flare Prediction from Heliophysics Big Data

classification of multi-class, multivariate time series

[Jun 8 - Jul 28 2020]

Eruption of a solar flare on July 19, 2012 (Credit: NASA Goddard)

Research Motivation





"Solar flares are a sudden explosion of energy caused by tangling, crossing or reorganizing of magnetic field lines near sunspots. Solar flares release a lot of radiation into space. If a solar flare is very intense, the radiation it releases can interfere with our radio communications here on Earth." [NASA]

There have been multiple attempts, in the recent years, in utilizing DNN's for classification of Solar Flares [A1-A8], while some of the most powerful ML models have been tried before that [B1-B13]. Yet the question of whether we really need "Deep" NN's, or whether the "shallow" and "not-very-shallow" classical ML models would suffice is left without a proper response. While on certain tasks such as Object Detection and Speech Recognition, DNN's have undoubtedly outperformed the classical ML models, the black-box nature of deep models is indeed a limiting factor for understanding the 'why' and the 'how' of the research topics. But before we favor DNN's over classical ML models, at least in the specific task of flare classification, we would like to compare the two realms.

We are aware that a true comparison of deep and shallow models' performance, even on a specific task, is in fact not feasible, as the feature extraction process is automated in one and engineered in the other. This difference always leaves some room for the suspicion that perhaps the utilized features have not been engineered properly, and perhaps they could be further optimized or altered with other features. Having that said, this is a valid question and here we would like to try and, to a practical extend, present a semi-fair analysis to address this concern.

In this direction, this Summer Code Sprint explores some of the Machine Learning models that perhaps have not yet been utilized on multivariate time series of solar flares, despite their general success in other domains. In particular we are interested in Imaging Time Series, that is, converting time series objects into images for the purpose of classification. The main motivation for such a transformation is the tremendous success of DNN's on image data. Two of such transformations are Gramian Angular Fields (GAF) and Markov Transition Field (MTF) algorithms [Xc-Xe]. When the flare time series are transformed into image-like objects, CNN's can be utilized to classify the time series in terms of their peak flux.

We would like to compare the performance of such an approach with a more classical model, namely Time-series specific Support Vector Classifier (SVC). We hope that such a comparison would shed some light on the necessity of DNN's versus classical ML models, and potentially pave the way for the ambitious task of flare forecasting for the weather forecast community.

Description

This Summer Code Sprint is organized by DMLab at Georgia State University to provide some practical training in Machine Learning on Big Data while trying to provide some insight for the concerns mentioned above. This sprint is planned for graduate students currently enrolled in M.S. in Computer Science or Data Science and Analytics. This 7-week program is a project-based course during which students are closely guided through different avenues toward a shared objective which is classification of solar flares using Machine Learning models. Students will be exposed to the complexity of multi-class and high-dimensional data, implement different analytical tools, and build upon their theoretical knowledge about Machine Learning and Data Mining.

Students' final grade will be calculated as the sum of the following four components: Active Participation (10%), Project Implementation (40%), Project Maintenance (20%), and Final Report (30%).

During this sprint ...

Students will obtain:

  • hands-on experience in pre-processing of real-world benchmark dataset,

  • skills needed for knowledge discovery from multivariate time series,

  • a new and practical perspective in the field of Machine Learning/Data Science.

At the end ...

  • All participants will present their work in DMLab, to an audience of Computer Scientists and Solar Physicists.

  • Upon successful completion of the course, they will earn a grade for 4 credit hours.

  • We will help the passionate students turn their quality work into a scientific publication and submit them to IEEE BigData 2020 conference.

> DMLab may sponsor the students for registration fee of the conference upon acceptance of their papers.

Details

  • Event: BDML Summer Code Sprint 2020: Solar Flare Predictions from Heliophysics Big Data

  • Duration: 7 weeks (June 8 - July 28, 2020)

  • Place: Online

  • Course: Directed Readings (by Dr. Rafal Angryk) -- CSC 6999 -- 4 Credit Hours

  • Schedule: Wednesdays, 14:00 - 17:00

  • Prerequisite: Machine Learning (CSC 6850 / CSC 8850) and/or Deep Learning (CSC 8851).

How to Apply




We highly encourage all students who are passionate about Machine Learning and enjoy teamwork, to apply.

Eligible Graduate students must:

  • be currently enrolled in M.S. in Computer Science or Data Science and Analytics,

  • have passed at least one of the prerequisites (6850 / 8850 / 8851) with B+ or higher, and

  • be fluent in Python, and familiar with necessary technologies such as GIT and Dockers.


Please, email your (unofficial) Transcripts of Records and CV to us at:

> rangryk[at]gsu.edu

> aahmadzadeh1[at]cs.gsu.edu

with the email titled as "Code Sprint Application".