Reject inference applied to large data sets

Reject inference applied to large data sets Introduction One of the most common use of reject inference technique is negotiation and application scoring. When prospective customers approaches a bank for a loan, it is important to evaluate their credit worthiness or rather if they are likely to default on the loan. Therefore, appropriate models are usually applied, which are pegged upon the bank’s previous performance, and on discovering the fundamental characteristics that could be useful in establishing the prospects of new customers. Apparently,
The extent to which the reject inference basic statistical assumptions are fulfilled is also an important determinant of the reject inference benefit. An example of portfolios where few applications are rejected include mortgages, in which case the reject inference may have no significance because, compared with the entire population, the rejected application’s sub-population is very small and hence the bias as a result of the missing data from the rejected is inconsequential. Nevertheless, the loans for small businesses, which exhibit very high risk may have over 50% reject rate while the bias due to screening is too high that it cannot be ignored. It has, however, not been known which circumstances under systematic screening should not be ignored for the purpose of parameter estimation. In addition, since the bias is data contingent, establishing the basic principle is very …show more content…
…………………………………….(4) The extent to which the lending officials use observable applicant characteristics is represented by the coefficients of y. On the other hand, the extent to which, lending officials methodically select applicants using the unobservable variables is represented by the correlation of p. Since the selection equation is fully observed, it is possible to estimate it separately all the time. According to Meng and Schmidt (1985), this will not be efficient unless p = 0. Likewise, when p is not equal to zero, a logic method or standard probit, used in the default equation provides a set of coefficients that are biased. As such, p serves to correct the systematic sample selection and possible unobserved bias that is probable in the default equation separate estimation (Boyes et al. 1989). The cost of partial observability in the model, according to Meng and Schmidt (1985), was shown to be fairly high, which also suggest that if possible, it can be essential to correct additional information. In the field of credit scoring, this means that it is not safe to assume, in the first instance, that p = 0. Alternatively, a better way can be sought in an early development phase used to judge the cost of incomplete observability. However, without referring to a certain set of data, it is not possible to quantify the efficiency loss (Poirier, 1980). As such, a careful way of doing it is through the application of the bivariate probit model in the first

