Google “Ascending Strike”: from every perspective to preserve the three-dimensional effect of illumination, 2D becomes “4D”

2D becomes 3D, and no simulation is required. Google used NeRF this year to hit the stage of falsehood:

  The above result is also using a few 2D photos as input, but there are real-time light and shadow effect generation results, which are more realistic and vivid, giving people the illusion of “fighting the stars”.

  This research to turn 2D photos into “4D” is an important improvement made by the Google team in conjunction with MIT and UC Berkeley on the previous NeRF, called NeRV , which is the neural reflection field and visible light field used for lighting rendering and view synthesis .

  Below we try to introduce NeRV in the most concise and easy-to-understand way. Except for the most critical and important derivation, try to do as little mathematical reasoning as possible so that you can understand it in one article.

  Google’s “upgrade” blow

  The change of light and shadow gives us an intuitive sense of time for the scene, and the method for the research team to achieve this effect is to add a light simulation process to the original NeRF .

  In NeRF, the scene is not modeled as a continuous 3D particle field that absorbs and emits light, but the scene is represented as a 3D field of directional particles that absorb and reflect external light sources.

  Such models are inefficient in simulating light transmission and cannot be extended to global illumination scenarios.

  The researchers introduced the neural visibility field parameter to compensate for this, which allows effective query of the visibility between the light and the pixels required to simulate light transmission.

  Specifically, NeRV network has 3 steps to restore 4D effects, corresponding to 3D scene generation , light and shadow effect simulation , and rendering .

  Neural reflex field

  NeRF does not separate the effect of incident light from the material properties of the surface.

  Modifying NeRF to achieve dynamic lighting is straightforward: represent the scene as a field of particles reflecting incident light.

  Given an arbitrary lighting condition, a standard volume rendering integral can be used to simulate the transmission of light in the scene when particles are reflected:

  Among them, the view-dependent emission term Le(x, ωo) in equation (1) is replaced by the integral of the light incident direction on the surface S of the object, that is, the product of the incident light radiance Li and the reflection function R from any direction, It describes how much light incident from a certain direction ωi is reflected to the direction ωo.

  Light Transmission in the Neural Visual Field

  Although it is straightforward to modify NeRF to achieve a certain degree of lighting effects, it is difficult to calculate the volume rendering integral of general lighting scenes for continuous volume representations like NeRF.

  The above figure intuitively shows the zoom characteristics of the simulated lighting scene, which makes it particularly difficult to simulate volumetric light transmission.

  Even if you only consider the direct lighting from the light source to the scene point, the brute force calculation scheme is very difficult for the case of more than a single point light source, because it needs to follow the path from each scene point to each light source and repeatedly query the shape of the MLP Bulk density.

  Violent sampling to render a single ray under indirect lighting will require a petaflop of calculation, while rendering about one billion rays during training.

  So the team improved the problem by replacing several volume density integrals with the learned approximations: introducing a “visibility” multilayer neural network, which can give an approximate value for the visibility of the illumination in any input direction, and an approximate value for the expected end depth of the corresponding light. .

  This method greatly reduces the computational complexity of direct lighting and indirect lighting, so that in the training cycle of optimizing the continuous lighting scene representation, direct lighting and its bounced indirect lighting can be simulated at the same time.


  Assuming that there is a camera ray x(t)=c tωo passing through NeRV, use the following process to render:

  1) Draw 256 layered samples along the ray, and query the volume density of each point, the surface normal, the shape of the BRDF (Bidirectional Reflectance Distribution Function) parameter, and the reflection neural network.

  2) Directly illuminate each point along the ray.

  3) Use indirect illumination to block each point along the ray.

  4) The total reflected radiance Lr(x(t), ωo) of each point along the camera ray is the sum of the numbers of steps 2 and 3. Combine these along the rays and calculate the pixel color using the same orthogonal rule.

  In layman’s terms, the reflection of an incident light on all pixels of the entire scene is calculated separately and then integrated.

  Test Results

  In the final test results, NeRV far exceeded NeRF and NLT:

  The team trained the models on different data sets, and then restored the three images of the statue, bulldozer, and hot dog.

  It can be seen from the results that the peak signal-to-noise ratio (PSNR) and MS-SSIM (multi-scale structure ratio) loss function are generally better than the previous methods.

  For personal video producers, game developers, and animation companies that lack 3D modeling experience, the maturity of this type of technology can be described as “good news.”

  Through AI technology, the realization of 3D effects is further simplified, which is why companies such as Facebook, Adobe and Microsoft have invested in this area of ​​research.

  Finally, the researchers revealed that this algorithm is about to be open source, and the project homepage will be given to you directly.


& dytech,xtech.