5/5/11

Credit Scoring Using Data Mining Techniques

Credit scoring involves evaluating the likelihood that a borrower will repay a loan. This evaluation assumes that a lender or credit bureau has detailed information about the history of an applicant. Data mining involves the collection of large amounts of this data and aggregating, sorting and classifying that information in a useful way. Data mining therefore, helps lenders recognize patterns they can use in developing credit scoring models.
  • A Brief History of Credit Scoring

    • Credit scoring was developed into a formal body of practice in the United States in the 1940s, when a shortage of credit analysts during WWII forced many companies to ask departing staffers to write down how they make credit determinations. Businesses coupled those rules with academic research developed around the same time to create statistical models of which loans would likely turn "bad," according to an article in the Singapore Management Review. Credit scoring took off in the 1960s as a rising number of consumers sought credit cards, forcing companies to look for automated processes for evaluating applicants. Credit scoring gained further accepted in the 1970s when, in response to federal anti-discrimination laws, financial institutions embraced them for their ability to generate lending decisions on a race-neutral basis.

    Data Mining

    • According to the Singapore Management Review, "data mining has been viewed as the offspring of three different disciplines, namely database management, statistics and computer science." It uses tools from each discipline, relying on large databases to supply and analyze huge amounts of information. In the case of credit scoring, such information includes how much a person earns, how many credit cards they have and whether they have defaulted on loans recently. The goal of data mining is to find patterns of behavior. For credit scoring, that behavior will tend to be financial in nature. Whereas early statistical models sought information on whether a loan would be "good" or "bad," data mining allows companies to forecast whether a borrower will pay it back early or make minimum payments. These judgments are far more valuable. A borrower who makes the minimum payments or incurs late fees is far more profitable than one who borrows and repays ahead of schedule.

    Techniques

    • Predictive techniques that assess how a borrower will pay back money include regression analysis. In regression analysis, the value of variables are analyzed for their impact on other variables, as in, when "x" goes up, "y" also rises. Or, when "z" goes down, "r" rises. Other techniques involve the decision tree, in which data is analyzed according to a series of questions that will later help sort people or information into categories. Neural networks are said to be modeled on the human brain. They rely on a series of "nodes" that are arranged in a layer. A piece of data may impact one or many nodes in a layer, with each node performing a calculation and passing the results to one or many nodes in the next layer. In these very complex systems -- some use the term artificial intelligence to describe them -- the output is a credit score.

    Uses

    • The most common credit scoring model is the one developed by the Fair Isaac Corporation, usually referred to as the FICO score. FICO relies on data mining and neural networks to establish a credit score. The firm sponsors an annual data mining competition that "provides undergraduate and graduate students an opportunity to test out their data mining skills on a real-world data set."

    Dangers

    • Despite its widespread use, the development of credit scoring models using data mining has its limitations. Most importantly, data sets used for data mining often contain errors and omissions. Data sets may require significant preparation before they are usable. Credit scoring models may also be "using a biased sample of consumers and customers who have been granted credit." According to the Singapore Management Review, "the credit scoring model built using this sample will generally not perform well on the entire population since the data used to build the model is different from the data that the model will be applied to."

  • No comments: