Retopologize the point cloud

Background

What if we want to build a nice topology around a pointcloud similar to mesh retopology? Let’s think about it.

Method

Tangent field

  • Direction field. We can surely estimate the 4-rosy field for each vertex based on a KNN graph.
  • Singularity vertex. For each point and it’s tangent plane, we project points, locally triangulate, and count the rosys (if not 4, it’s a singularity).
  • Singularity region. We can possibly merge singular vertices.
Read More

成立业委会流程与分析

背景

我们的目标是成立黄埔雅苑业主委员会,合理合法的有效伸张并行使业主的权力。为了达到这个目标,第一步是深刻理解深圳市业主委员会成立的筹办流程和细节,并识别其中的风险和挑战。本文旨在对政府文件《成立业主委员会的法定流程》进行拆解分析。

另外补充最权威的官方材料:

Read More

Image Prior Supervision

Background

Image prior supervision is a topic under the 3D texturing task. Although as we experimented in SDS, Highres-optim and ControlNet, optimizing the color guided by SDS with ControlNet guidance is sufficient to generate good texture, efficiency and editing is hard to address. Therefore, I believe it is an important to study about editing images and using them to guide the texturing.

Potential solution

There are two different problems to study to achieve the goal.

  1. How shall we generate good images for supervision?
  2. How shall we use images as supervision?
Read More

High Resolution Optimization

Background

We aim to generate geometry with a high-resolution appearance. The only way is to zoom in since stable diffusion has limited resolution.
There are several options for zoom-in. The first question is whether to use a multi-resolution geometry model, and the second question is the way to zoom in for renderings.

Geometry model

We will investigate this problem in the experiments of 3D representations.

Zoom-in rendering

I conclude several ways to zoom in.

  • Progressive zoom-in. Such methods first render in far distance to capture the whole object. Then, it zooms in to the closer viewpoint to optimize details.
    Progressive zoom-in is apparently a good option when no initial geometry is given – in which case there is no definition of close/far. However, a limitation is that by looking closer, global context is unavailable it usually leads to bad results after long optimization.
Read More

Adding Controls

Background

We introduce additional geometry controls to better align the appearance with geometry. As from introduction, geometry alignment can be achieved by incorporating geometry features for score distillation (align geometry with the rendering), or controlling the RGB to align with geometry features.

Align Color to Geometry

ControlNet

Control Conditions. One of the most popular choices for aligning color to geometry is via controlnet. For example, we can render geometry features to views as edge/normal/depth. Avatarverse further controls character appearance with dense-pose features.

From my experiments, different geometry features contribute to the control in various ways.

  • Canny edge is suitable for precisely controlling the boundary alignment. For example, the generated RGB image actually aligns with the canny edge with pixel-level accuracy. Therefore, it is important to texture the 3D model with edge control. However, rendering the edges might be problematic given stupid triangle soups in the CAD model. Occlusion boundary/sharp edge features are naturally ambiguous. My current solution is to render a normal map and directly extract canny features from the normal image, which seems to work okay.
Read More

Single View SDS

Introduction

According to my introduction, the simplest experiment to start with is the single-view text-to-image experiment using the stochastic optimization method. The reason that we want to optimize rather than denoise is that we extend to 3D, we need to seamlessly generate textures from multiple views and stochastic optimization appears to be the most promising method. In this article, I am going to investigate all the details related to the optimization process.

Background

Formulation

Given a view parameterized by $\theta$, we can render it via $\mathcal{R}(\theta)$. For example, $\theta$ can represent a vector of pixel RGB/Latent, and $\mathcal{R}$ as the identical mapping. A more advanced version parameterizes $\mathcal{R}$ as a hash encoding followed by a tiny MLP according to InstantNGP. As from DreamFusion, the SDS loss can be represented as:

$\nabla_{\theta}\mathcal{L}_{SDS} = E_{t,\epsilon}\; w(t)(e_\phi(\mathbf{z}_t;\mathbf{y},t)-\epsilon)\frac{\partial \mathbf{x}}{\partial \theta}$
where $e_\phi$ is the predicted noise given $\mathbf{z}_t$ as input and $\mathbf{y}$ as prompt at timestep $t$, and $w(t)$ is the scale of noise added to the data. Practically, we optimize the SDS loss stochastically by randomly sampling a gaussian noise $\epsilon$, a timestep $t$, add the noise to latent version of $\mathcal{R}(\theta)$ (probably via a encoder $\mathcal{E}$) as $\mathbf{z}_t$, and compute $|w(t)(e_\phi(\mathbf{z}_t;\mathbf{y},t)-\epsilon)|^2$ as the loss for each training step. An Adam optimizer is linked to $\theta$ for optimization.

Read More

Introduction to X-to-3D

Background

With the recent progress in the field of generative AI, the community observes new opportunities for 3D content generation exploiting novel neural network architectures pre-trained on large-scale 2D/3D datasets. This article aims to derive the picture of major components that hundreds of recently published papers work on and contribute to this field.

Introduction

Input and output X-to-3D, in a nutshell, aims to convert some input information to 3D content. To expand it, input is mainly considered as text and image, while an image can be a single-view or multi-view of the same physical object. For 3D, the content can be a tiny object, or an indoor/outdoor scene, each of which faces different challenges. While we mainly discuss the object level since it has great value to the industry and could be the easiest one to handle, some approaches can be extended to work on scenes for potential AR/VR applications.

Input

The input is easy to understand. The input is text/image which can be encoded as latent features. Images can be directly used as supervision for renderings considering the RGB and depth. Note that the image could also be potentially associated with a camera pose.

3D representation

While the input is quite clear, 3D content can be represented in different forms:

Read More

佛学闲谈——洞见

概述

洞见(Why buddhism is true)是美国的一位学者站在现代科学视角下看待佛学的一本书。其目的是抛去其权威、神秘主义、过时的思想,剖析并介绍佛学中较为公认的,与现代科学和社会相符合的观点。书中在达尔文自然选择学说的前提下诠释佛学的合理性,并展现了几个有趣的心理学实验用以佐证佛家的论点,并结合自身体验讲述了正念冥想的价值。其论述可读性强,论据翔实,并结合了作者自身的一些思辨,去其糟粕取其精华。本文意在阐明并解释其主要观点,讨论它与西方哲学的联系和差异,并引入自己的批判和思考。

主要观点

本书阐释的佛学,其观点我总结如下:

1
人受“五蕴”之苦,生而不自由。需进行“八正道”修练(书中主讲正定+正念),逐渐进入“无我”、“无相”的状态,从而得到解脱。

展开来讲,主要有几个关键的概念。

Read More