Dear participants of the competition!
We have summed up the results of the first stage and we have defined 3 winners who competed for the prize of $ 5,000.
We remind you that the results of the second stage, in which we give away $ 10,000, will be declare after March 1, 2015, when the annual financial results of the companies are announced, so the models will be tested on the real data on what companies were acquired.
Mergers and acquisitions (M&A) is a class of economic processes of consolidation of business and capital occurring at the macro and micro levels, which result in appearance of a larger company instead of several smaller.
Acquisition is a bargain performed in order to establish control over a company by acquiring more than 30% of the share capital (stocks, shares, etc.), while maintaining judicial independency.
Solvers are invited to forecast the probability of the company being acquired in the coming year.
Time-Line of the competition
- 09.06.2014 start of the competition
- 22.08.2014 results of the competition and award winners ($5,000 prize fund). Evaluation is based on the 2 criteria: evaluation by the functional (see below) and expert judgment (decision simplicity, reproducibility of algorithm, expert opinion). Both criteria are equivalent in choosing the winners.
- 01.03.2015 the final results of the competition and award winners ($10,000 prize fund). By March 2015 the company's financial year will be over, so the model will be tested on real data about what companies have been acquired. The criterion for awarding the main prize is this very objective data.
Data is provided for the task in tables containing information about the companies on the following parameters:
|1. Cash and cash equivalents (столбцы 1–21)||13. CAPEX (столбцы 253–273)|
|2. Inventories (столбцы 22–42)||14. Net Sales (столбцы 274–294)|
|3. Total Current Assets (столбцы 43–63)||15. Gross Margin (столбцы 295–315)|
|4. Total Current Liabilities (столбцы 64–84)||16. EBITDA (столбцы 316–336)|
|5. Total Assets (столбцы 85–105)||17. Dividend yield (столбцы 337–357)|
|6. Property, Plant and Equipment, Net (столбцы 106–126)||18. Market Capitalization (столбцы 358–378)|
|7. Goodwill (столбцы 127–147)||19. Gross Income (столбцы 379–399)|
|8. Short-Term Debt (столбцы 148–168)||20. Financial Costs (столбцы 400–420)|
|9. Long-Term Debt (столбцы 169–189)||21. Net Income (столбцы 421–441)|
|10. Net Debt (столбцы 190–210)||22. Book Value (столбцы 442–462)|
|11. Total Liabilities (столбцы 211–231)||23. Free Cash Flow (столбцы 463–483)|
|12. Depreciation and amortization (столбцы 232–252)||24. Sector (столбец 484)|
All data is divided into three files:
File Train_contest.csv - training sample in which data is available for all the above parameters except for the sector for the years 1994-2014 (21 values for each attribute), and there are two columns with the answers:
Column with a binary value, showing whether the company was acquired;
Column in which, if the company was acquired, the date of the news about the acquisition is given, and otherwise the column is blank;
File Valid_contest.csv - validation sample for which all parameters are unknown (replaced by NaN), starting with some of the X year. For this sample it is required to predict the probability of the news about the acquisitions during this year of the X.
File FinalTest_contest.csv - final test sample, which is structured like the validation one, however, the companies in these samples (validation and final test) are different. The result obtained by using the participant’s algorithm on this sample will be the result of his performances in the competition.
*It should also be noted that if the NaN value is met "inside" of the parameter (e.g., in some row in column 236 stands numeric value, in the column 237 - NaN, however, columns 238–252 have numeric value numeric), it must ne taken as lacuna in data and can not be interpreted as the year X, which has to be predicted. In addition, for a clearer understanding of the situation prevailing at some particular moment in the market, as additional information, participants are given weekly quotation S&P500 index for the 1994-2014 year.
The most popular economic indicators
Based on the available data, participants can figure out and use the following economic indicators:
Functional for evaluation solutions
Evaluation of submitted solutions will be done using the tool getQualityOfSolution (Answer, Ideal) (written in a programming environment MatLab). The idea implemented in this functional is based on the normalized least quality ranking objects (normalized Discounted Cumulative Gain, nDCG), which can be expressed by the following formula:
where SortedIdeal - vector of correct responses, sorted in descending order of probability in the Answer, and OnesZerosIdeal - descending sorted vector of correct answers (first go all the units, followed by all zeros).
!!!Important note: in case of coincidence of predicted probabilities for a certain set of companies, after sorting they will go in the same order as the vector Answer!!!
Example calculation of solution quality
If the vector response-probability is (0.5, 0.8, 0, 0), and the ideal vector responses-acquisitions is (1, 0, 1, 0), then the quality of the solutions can be calculated using the formula: