Subject:

Click here to get this description in tex format and here to get the figure in eps format.

The problem we try to tackle is forecasting the initial return for a set of IPOs using a set of independent variables identified by means literature review. Most of them are related to the structure of the IPO.

Forecasting the initial return of IPOs is very difficult since the specific set relevant variables is yet to be identified. Additionally, the researcher faces the supplementary challenge of dealing with the rather random nature of the target variable that is common in empirical finance.

## Instances and best known solutions for those instances:

Most of the research deals with the identification of explanatory variables using linear regression models whose R^{2} is generally in the range 0.15-0.20.

## JEL classification

Little effort has been made to forecast the initial return.

The provided data covers 1,007 companies taken public between 1996 and 1999 in the US. This includes AMEX, NASDAQ and NYSE IPOs and excludes ADRs, close-end funds, financial institutions and unit offerings.

## Video 6: Variable Selection

The sample consists of the following fields:

Infr_aj: Dependant variable. It measures the percentage difference between the offer price and the first trading day close. The figure was adjusted for the market return.

Lsize: Natural log of the proceeds raised in the IPO (dollar).

Retained: Number of shares sold divided by the pre-offering number of shares.

Price: Final offering price.

LowP: Lower end of the price range offered to potential investors during the roadshow.

HighP: Higher end of the price range offered to potential investors during the roadshow.

RanW: Difference between the lower and higher ends of the price range as a percentage of the lower end.

RanHan: This variable, suggested by Hanley [Han93], represents the absolute value of the difference between the final offer price and the mid point of the price range as a percentage of the last figure.

Employees: Number of employees at the time of the flotation.

SIC: Primary four digit Standard Industrial Classification code.

Techdummy: The variable equals one if the primary SIC code fits under the definition of technology company.

Prestige: Binary variable whose value equals one if the financial main financial advisor was prestigious and zero otherwise.

The financial advisors were classified according to the methodology suggested by Balvers et al. [BMM88]. Financial institutions are labeled as prestigious if they were consistently considered top 25 in the annual lead-manager rankings published by Institutional Investor Magazine for the years of study.

Click here to download the sample

In [QLI05] we use a subset of the variables: Infr_aj, Lsize, Retained, Price, Techdummy, Prestige, and Range (defined as (HighP-LowP)/LowP).With these variable we compare the standard regression methods with an evolutionary rule-based system. The error used to assess the performance of the models is the Normalized Mean Square Error. The sample used in this analysis consisted of a subset of 840 patterns picked up randomly, leaving the rest as a validation set.

The regressions to be used are OLS (Ordinary least square) and LTS (Least trimming square). LTS discards noisy data and the trimming constant represents how Which points out of the initial 840-pattern training data set are used to fit the regression. The Rules-Based system offers predictions for 90% of the validation test.

Given that the prediction error for a regresion can be estimated, we use this information to discard some predictions among those that are likely to be the worst. It is fair to compare the RB system with the regresion predicting the best 90% of the validation set.

The results reported in the table below show that the RBS approach offers better results compared to the Linear Models.

Model | Trimming Constant | NMSE Test | NMSE 90% Test |
---|---|---|---|

OLS | - | 0,92302 | 0,77238 |