source: golgotha/src/golg/jc_log @ 80

Last change on this file since 80 was 80, checked in by Sam Hocevar, 12 years ago
  • Adding the Golgotha source code. Not sure what's going to be interesting in there, but since it's all public domain, there's certainly stuff to pick up.
File size: 3.3 KB
Line 
1800 464 7928
2800 swb isdn
352.85   54.70 250.00  *125*
452.85
5
6
7
8
9DirectX
10
11  - using Blt() to do clears is much much faster than memset.  On many cards
12    the clear can happen async and doesn't take much time.  Even if you call
13    lock() immediately after Blt() it's much faster.
14
15  - Blt's to word alligned boundaries are much faster (~1.5x faster)
16
17  - limit surface locks() & unlocks(),  I went form 204fps to 232fps by limiting locks
18    to 1 lock per frame
19
20  - Implementing a software z-buffer can be accelerated by DirectX by creating the
21    Z-buffer as a directx Surface and using Blt to clear the surface.  This is a little
22    tricky but you can get your clears for much less time.  If you are using a floating
23    point z-buffer,  I recommend creating a 16bit surface that is (width * 2, height),
24    because creating a 32bit surface may not possible clear the high-byte (considered alpha).
25    It is safest to clear the Z buffer to 0 so that the card does not do any transformations
26    on the color value.  This means you have to negate your z-values
27    and reverse you z-compare function.
28
29
30
31
32Threaded Applications
33
34  - for threaded applications, don't use EnterCriticalSection & LeaveCriticalSection section
35    these take about 60cycles to execute (together).  Instead use intel's "bts" instruction
36    which will will cost 6cycles inlined, and 12 cycles if called  (Does anyone know if you
37    need to prefix bts with a "lock" for a multi-processor environment?)
38
39    ********* example replacement class *****
40
41    class critical_section_lock
42    {
43    public:
44      int flag;
45
46      tlock() { flag=0; }
47
48      void __fastcall lock();
49      void __fastcall unlock();
50    };
51
52    void __fastcall critical_section_lock::unlock()
53    {
54      __asm mov [ecx], 0     
55    }
56
57    void __fastcall critical_section_lock::lock()
58    {
59      __asm
60      {
61        start:     
62          bts [ecx], 0
63          jnc success
64          call thread_yield   // give up our time-slice
65          jmp start
66        success:
67      }
68    }
69
70
71General Suggestions
72
73  - converting a float to an int using C-casting under Visual C is very slow
74    because Visual C does a "safe" conversion which involves calling a function
75    changes the FPU registers to set the currect rounding mode and then does a
76    fistp - the meat of the operation, restores the FPU and return.  Most of
77    the time the FPU will already be in the correct state so using a simple inlined
78    assembly function like this will save a lot of time:
79
80        inline int long ftoi(float f)
81        {
82          int res;
83          __asm
84          {
85            fld f
86            fistp res
87          }
88          return res;
89       }
90
91
92
93  - when timing functions under MSDEV turn off incremental linking. incremental linking often
94    adds an extra "jmp" for every "call" which is used to patch together your code without having
95    to re-layout everything.  This adds 2-3 extra clocks to every function you call.
96   
97  - for small functions use the __fastcall function declaration and for C++,
98    use [ecx] instead of "this" using "this" will cause the compiler to
99    generate "push ebp, mov ebp, esp" pairs  which may not be needed for simple get/set type
100    functions
101
102
103
104Profiling & tuning
105 
106  - start with small and work your way up. 
Note: See TracBrowser for help on using the repository browser.