You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which creates very interesting and amazing results.
82
+
83
+
#### Efficient Sampling: Sobol Quasi-Monte Carlo Sequence
84
+
85
+
In path tracing or any other Monte Carlo-based light transport algorithms, apart from improving
86
+
87
+
the performance from a point of view of programming, we can also improve it mathematically. Quasi-Monte Carlo sequence is a class of quasi-random sequence that is widely used in Monte Carlo simulation. This kind of sequence is mathematically proved to be more efficient than pseudorandom sequences (like what `thrust::default_random_engine` generates).
88
+
89
+
Theoretically, to maximize the benefit of Sobol sequence, we need to generate unique sequences for every pixel during each sampling iteration at real-time -- this is not trivial. Not to say that computing each number requires at most 32 bit loops. A better choice would be precomputing one pixel's sequence, then use some sort of perturbation to produce different sequences for different pixels.
90
+
91
+
Here is the result I get from testing the untextured [PBR texture scene](#representative-outcome). With the same number of samples per pixel, path tracing with Sobol sequence produces much less noise (lower variance).
Implementing gamma correction is very trivial. But it is necessary if we want our final image to be correctly displayed on monitors, through which we see by our eyes.
86
106
87
-
---
107
+
##### Tone Mapping
88
108
89
109
90
110
91
111
### Performance
92
112
93
-
#### Fast Intersection: Stackless SAH-Constructed BVH
113
+
#### Fast Intersection: Stackless SAH-Based Bounding Volume Hierarchy
94
114
95
-
For ray-scene intersection, I did two levels of optimization.
115
+
Ray-scene intersection is probably the best time consuming part of
96
116
97
-
First, I wrote a SAH-based BVH. SAH, the Surface Area Heuristic is a method to decide how to split a set of bounding volumes
117
+
I did two levels of optimization.
98
118
99
-
The second level of optimization
119
+
##### Better Tree Structure: Surface Area Heuristic
100
120
101
-
#### Single-Kernel Path Tracing
121
+
First, I implemented a SAH-based BVH. SAH, the Surface Area Heuristic, is a method to determine how to split a set of bounding volumes into subsets when constructing a BVH, that the constructed tree's structure would be highly optimal.
122
+
123
+
##### Faster Tree Traversal on GPU: Multiple-Threaded BVH
102
124
103
-
There is a paper . It had an interesting opinion: instead of
125
+
The second level of optimization is done on GPU. BVH is a tree after all, so we still have to traverse through it during ray-scene intersection even on GPU.
104
126
105
127
### Other
106
128
107
-
#### Streamed Path Tracing Using Stream Compaction
129
+
#### Single-Kernel Path Tracing
130
+
131
+
To figure out how much stream compaction can possibly improve a GPU path tracer's performance, we need a baseline to compare with. Instead of toggling streamed path tracer's kernel to disable stream compaction, we can separately write another kernel that does the entire ray tracing process. That is, we shoot rays, find intersection, shading surfaces and sampling new rays in one kernel.
108
132
109
133
#### First Ray Caching (G-Buffer)
110
134
111
-
Since I implemented anti-aliasing and physically based camera at the very beginning, when I noticed that there is still a requirement in the basic part, I found it
135
+
In real-time rendering, a technique called deferred shading stores scene's geometry information in texture buffers (G-Buffer) at the beginning of render pass, so that . It turns out we can do something similar with offline rendering.
112
136
113
137
## Performance Analysis
114
138
115
-
### How Much GPU Improves Path Tracing Efficiency
139
+
### Why My Multi-Kernel Streamed Path Tracer Not Always Faster Than Single-Kernel?
116
140
117
-
I'm able and confident to answer this question because I have one CPU path tracer from undergrad.
141
+
What got me surprised it wasn't that efficient as expected. In some scenes, it was even worse than the single kernel path tracer.
118
142
119
-
### Why My Multi-Kernel Streamed Path Tracer Not Faster Than Single-Kernel?
143
+
In general, it's a tradeoff between thread concurrency and time spent accessing global memory.
120
144
121
-
To know how streaming the rays can improve path tracing efficiency, I additionally implemented a single-kernel version of this path tracer.
145
+
There is a paper stressing this point, from which I also got the idea to additionally implement a singlekernel tracer
122
146
123
-
What got me surprised it wasn't efficient as expected. In some scenes, it was even worse.
147
+
-[[Progressive Light Transport Simulation on the GPU: Survey and Improvements]](https://cgg.mff.cuni.cz/~jaroslav/papers/2014-gpult/2014-gpult-paper.pdf)
124
148
125
-
Using NSight Compute, I inspected
126
149
127
-
In general, it's a tradeoff between thread concurrency and time spent accessing global memory.
128
150
129
151
### Material Sorting: Why Slower
130
152
153
+
After implementing material sorting, I found it actually slower. And not by a little bit, but very significantly. With NSight Compute, I got to inspect how much time each kernel takes before and after enabling material sorting.
154
+
155
+
Like what the figure below shows, sorting materials does improve memory coalescing for intersection, sampling and stream compaction (I grouped sampling and lighting together because I did direct lighting). However, the effect is not sufficient to tradeoff the additional time introduced with sorting at all. As we can see the test result below, sorting makes up more than 1/3 of ray tracing time.
156
+
157
+

158
+
131
159
Or, there is another possibility that BSDF sampling and evaluation is not that time consuming as expected. The bottleneck still lies in traversal of acceleration structure.
132
160
133
161
Therefore, in my opinion, material sorting is best applied when:
134
162
135
163
- There are many different materials in the scene
136
164
- Primitives sharing the same material are randomly distributed in many small clusters over the scene space. The clusters' sizes in solid angle are typically less than what a GPU warp can cover
137
165
166
+
167
+
168
+
### How Much GPU Improves Path Tracing Efficiency Compared to CPU
0 commit comments