1 | 800 464 7928
|
---|
2 | 800 swb isdn
|
---|
3 | 52.85 54.70 250.00 *125*
|
---|
4 | 52.85
|
---|
5 |
|
---|
6 |
|
---|
7 |
|
---|
8 |
|
---|
9 | DirectX
|
---|
10 |
|
---|
11 | - using Blt() to do clears is much much faster than memset. On many cards
|
---|
12 | the clear can happen async and doesn't take much time. Even if you call
|
---|
13 | lock() immediately after Blt() it's much faster.
|
---|
14 |
|
---|
15 | - Blt's to word alligned boundaries are much faster (~1.5x faster)
|
---|
16 |
|
---|
17 | - limit surface locks() & unlocks(), I went form 204fps to 232fps by limiting locks
|
---|
18 | to 1 lock per frame
|
---|
19 |
|
---|
20 | - Implementing a software z-buffer can be accelerated by DirectX by creating the
|
---|
21 | Z-buffer as a directx Surface and using Blt to clear the surface. This is a little
|
---|
22 | tricky but you can get your clears for much less time. If you are using a floating
|
---|
23 | point z-buffer, I recommend creating a 16bit surface that is (width * 2, height),
|
---|
24 | because creating a 32bit surface may not possible clear the high-byte (considered alpha).
|
---|
25 | It is safest to clear the Z buffer to 0 so that the card does not do any transformations
|
---|
26 | on the color value. This means you have to negate your z-values
|
---|
27 | and reverse you z-compare function.
|
---|
28 |
|
---|
29 |
|
---|
30 |
|
---|
31 |
|
---|
32 | Threaded Applications
|
---|
33 |
|
---|
34 | - for threaded applications, don't use EnterCriticalSection & LeaveCriticalSection section
|
---|
35 | these take about 60cycles to execute (together). Instead use intel's "bts" instruction
|
---|
36 | which will will cost 6cycles inlined, and 12 cycles if called (Does anyone know if you
|
---|
37 | need to prefix bts with a "lock" for a multi-processor environment?)
|
---|
38 |
|
---|
39 | ********* example replacement class *****
|
---|
40 |
|
---|
41 | class critical_section_lock
|
---|
42 | {
|
---|
43 | public:
|
---|
44 | int flag;
|
---|
45 |
|
---|
46 | tlock() { flag=0; }
|
---|
47 |
|
---|
48 | void __fastcall lock();
|
---|
49 | void __fastcall unlock();
|
---|
50 | };
|
---|
51 |
|
---|
52 | void __fastcall critical_section_lock::unlock()
|
---|
53 | {
|
---|
54 | __asm mov [ecx], 0
|
---|
55 | }
|
---|
56 |
|
---|
57 | void __fastcall critical_section_lock::lock()
|
---|
58 | {
|
---|
59 | __asm
|
---|
60 | {
|
---|
61 | start:
|
---|
62 | bts [ecx], 0
|
---|
63 | jnc success
|
---|
64 | call thread_yield // give up our time-slice
|
---|
65 | jmp start
|
---|
66 | success:
|
---|
67 | }
|
---|
68 | }
|
---|
69 |
|
---|
70 |
|
---|
71 | General Suggestions
|
---|
72 |
|
---|
73 | - converting a float to an int using C-casting under Visual C is very slow
|
---|
74 | because Visual C does a "safe" conversion which involves calling a function
|
---|
75 | changes the FPU registers to set the currect rounding mode and then does a
|
---|
76 | fistp - the meat of the operation, restores the FPU and return. Most of
|
---|
77 | the time the FPU will already be in the correct state so using a simple inlined
|
---|
78 | assembly function like this will save a lot of time:
|
---|
79 |
|
---|
80 | inline int long ftoi(float f)
|
---|
81 | {
|
---|
82 | int res;
|
---|
83 | __asm
|
---|
84 | {
|
---|
85 | fld f
|
---|
86 | fistp res
|
---|
87 | }
|
---|
88 | return res;
|
---|
89 | }
|
---|
90 |
|
---|
91 |
|
---|
92 |
|
---|
93 | - when timing functions under MSDEV turn off incremental linking. incremental linking often
|
---|
94 | adds an extra "jmp" for every "call" which is used to patch together your code without having
|
---|
95 | to re-layout everything. This adds 2-3 extra clocks to every function you call.
|
---|
96 |
|
---|
97 | - for small functions use the __fastcall function declaration and for C++,
|
---|
98 | use [ecx] instead of "this" using "this" will cause the compiler to
|
---|
99 | generate "push ebp, mov ebp, esp" pairs which may not be needed for simple get/set type
|
---|
100 | functions
|
---|
101 |
|
---|
102 |
|
---|
103 |
|
---|
104 | Profiling & tuning
|
---|
105 |
|
---|
106 | - start with small and work your way up.
|
---|