Table of Contents
Glasgow Haskell comes with a time and space profiling system. Its purpose is to help you improve your understanding of your program's execution behaviour, so you can improve it.
Any comments, suggestions and/or improvements you have are welcome. Recommended “profiling tricks” would be especially cool!
Profiling a program is a three-step process:
 Re-compile your program for profiling with the
      -prof option, and probably one of the
      -auto or -auto-all
      options.  These options are described in more detail in Section 5.2, “Compiler options for profiling” 
 Run your program with one of the profiling options, eg.
      +RTS -p -RTS.  This generates a file of
      profiling information.  Note that multi-processor execution
      (e.g. +RTS -N2) is not supported while
      profiling.
Examine the generated profiling information, using one of GHC's profiling tools. The tool to use will depend on the kind of profiling information generated.
GHC's profiling system assigns costs to cost centres. A cost is simply the time or space required to evaluate an expression. Cost centres are program annotations around expressions; all costs incurred by the annotated expression are assigned to the enclosing cost centre. Furthermore, GHC will remember the stack of enclosing cost centres for any given expression at run-time and generate a call-graph of cost attributions.
Let's take a look at an example:
main = print (nfib 25) nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2)
Compile and run this program as follows:
$ ghc -prof -auto-all -o Main Main.hs $ ./Main +RTS -p 121393 $
When a GHC-compiled program is run with the
    -p RTS option, it generates a file called
    <prog>.prof.  In this case, the file
    will contain something like this:
          Fri May 12 14:06 2000 Time and Allocation Profiling Report  (Final)
           Main +RTS -p -RTS
        total time  =        0.14 secs   (7 ticks @ 20 ms)
        total alloc =   8,741,204 bytes  (excludes profiling overheads)
COST CENTRE          MODULE     %time %alloc
nfib                 Main       100.0  100.0
                                              individual     inherited
COST CENTRE              MODULE      entries %time %alloc   %time %alloc
MAIN                     MAIN             0    0.0   0.0    100.0 100.0
 main                    Main             0    0.0   0.0      0.0   0.0
 CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
 CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
 CAF                     Main             6    0.0   0.0    100.0 100.0
  main                   Main             1    0.0   0.0    100.0 100.0
   nfib                  Main        242785  100.0 100.0    100.0 100.0
The first part of the file gives the program name and options, and the total time and total memory allocation measured during the run of the program (note that the total memory allocation figure isn't the same as the amount of live memory needed by the program at any one time; the latter can be determined using heap profiling, which we will describe shortly).
The second part of the file is a break-down by cost centre
    of the most costly functions in the program.  In this case, there
    was only one significant function in the program, namely
    nfib, and it was responsible for 100%
    of both the time and allocation costs of the program.
The third and final section of the file gives a profile
    break-down by cost-centre stack.  This is roughly a call-graph
    profile of the program.  In the example above, it is clear that
    the costly call to nfib came from
    main.
The time and allocation incurred by a given part of the program is displayed in two ways: “individual”, which are the costs incurred by the code covered by this cost centre stack alone, and “inherited”, which includes the costs incurred by all the children of this node.
The usefulness of cost-centre stacks is better demonstrated by modifying the example slightly:
main = print (f 25 + g 25) f n = nfib n g n = nfib (n `div` 2) nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2)
Compile and run this program as before, and take a look at the new profiling results:
COST CENTRE              MODULE         scc  %time %alloc   %time %alloc
MAIN                     MAIN             0    0.0   0.0    100.0 100.0
 main                    Main             0    0.0   0.0      0.0   0.0
 CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
 CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
 CAF                     Main             9    0.0   0.0    100.0 100.0
  main                   Main             1    0.0   0.0    100.0 100.0
   g                     Main             1    0.0   0.0      0.0   0.2
    nfib                 Main           465    0.0   0.2      0.0   0.2
   f                     Main             1    0.0   0.0    100.0  99.8
    nfib                 Main        242785  100.0  99.8    100.0  99.8
Now although we had two calls to nfib
    in the program, it is immediately clear that it was the call from
    f which took all the time.
The actual meaning of the various columns in the output is:
The number of times this particular point in the call graph was entered.
The percentage of the total run time of the program spent at this point in the call graph.
The percentage of the total memory allocations (excluding profiling overheads) of the program made by this call.
The percentage of the total run time of the program spent below this point in the call graph.
The percentage of the total memory allocations (excluding profiling overheads) of the program made by this call and all of its sub-calls.
In addition you can use the -P RTS option
     to
    get the following additional information:
ticksThe raw number of time “ticks” which were
          attributed to this cost-centre; from this, we get the
          %time figure mentioned
          above.
bytesNumber of bytes allocated in the heap while in this
          cost-centre; again, this is the raw number from which we get
          the %alloc figure mentioned
          above.
What about recursive functions, and mutually recursive groups of functions? Where are the costs attributed? Well, although GHC does keep information about which groups of functions called each other recursively, this information isn't displayed in the basic time and allocation profile, instead the call-graph is flattened into a tree.
Cost centres are just program annotations.  When you say
      -auto-all to the compiler, it automatically
      inserts a cost centre annotation around every top-level function
      not marked INLINE in your program, but you are entirely free to
      add the cost centre annotations yourself.
The syntax of a cost centre annotation is
     {-# SCC "name" #-} <expression>
where "name" is an arbitrary string,
      that will become the name of your cost centre as it appears
      in the profiling output, and
      <expression> is any Haskell
      expression.  An SCC annotation extends as
      far to the right as possible when parsing. (SCC stands for "Set
      Cost Centre").
Here is an example of a program with a couple of SCCs:
main :: IO ()
main = do let xs = {-# SCC "X" #-} [1..1000000]
          let ys = {-# SCC "Y" #-} [1..2000000]
          print $ last xs
          print $ last $ init xs
          print $ last ys
          print $ last $ init ys
which gives this heap profile when run:
The cost of evaluating any expression in your program is attributed to a cost-centre stack using the following rules:
If the expression is part of the
	  one-off costs of evaluating the
	  enclosing top-level definition, then costs are attributed to
	  the stack of lexically enclosing SCC
	  annotations on top of the special CAF
	  cost-centre. 
Otherwise, costs are attributed to the stack of
	  lexically-enclosing SCC annotations,
	  appended to the cost-centre stack in effect at the
	  call site of the current top-level
	  definition[10].  Notice that this is a recursive
	  definition.
Time spent in foreign code (see Chapter 8, Foreign function interface (FFI) ) is always attributed to the cost centre in force at the Haskell call-site of the foreign function.
What do we mean by one-off costs? Well, Haskell is a lazy language, and certain expressions are only ever evaluated once. For example, if we write:
x = nfib 25
then x will only be evaluated once (if
      at all), and subsequent demands for x will
      immediately get to see the cached result.  The definition
      x is called a CAF (Constant Applicative
      Form), because it has no arguments.
For the purposes of profiling, we say that the expression
      nfib 25 belongs to the one-off costs of
      evaluating x.
Since one-off costs aren't strictly speaking part of the
      call-graph of the program, they are attributed to a special
      top-level cost centre, CAF.  There may be one
      CAF cost centre for each module (the
      default), or one for each top-level definition with any one-off
      costs (this behaviour can be selected by giving GHC the
      -caf-all flag).
If you think you have a weird profile, or the call-graph
      doesn't look like you expect it to, feel free to send it (and
      your program) to us at
      <glasgow-haskell-bugs@haskell.org>.
[10] The call-site is just the place in the source code which mentions the particular function or variable.