-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to fit complex data. #618
Comments
Hi @zhangrentu. We have never tested Theseus with complex numbers, so I'm not really sure what parts of it would need to be modified. Certainly our custom linear solvers will not work with this type of data, so in the best possible case you are limited to use If you send me a code snippet of code I can try to run and perhaps suggest what modifications would be necessary. |
Hi @zhangrentu. I made a number of tweaks and removed some of our dtype constraints (add import torch
import theseus as th
def y_model(x, a, b, c):
return a * torch.exp((-1j * b + c) * x) # y = a * exp((-1ja + b) * x)
def generate_data(num_points=512, a=1, b=2, c=3):
data_x = torch.linspace(0, 1, num_points).view(1, -1)
data_y = y_model(data_x, a, b, c)
return data_x, data_y
def read_data():
data_x, data_y_clean = generate_data()
return (
data_x,
data_y_clean,
1 * torch.ones(1, 1),
2 * torch.ones(1, 1),
3 * torch.ones(1, 1),
)
x_true, y_true, a_true, b_true, c_true = read_data()
x = th.Variable(torch.randn_like(x_true), name="x")
y = th.Variable(y_true, name="y")
a = th.Vector(1, name="a") # a manifold subclass of Variable for optim_vars
b = th.Vector(1, name="b") # a manifold subclass of Variable for optim_vars
c = th.Vector(1, name="c") # a manifold subclass of Variable for optim_vars
for v in [x, y, a, b, c]:
v.to(dtype=torch.complex32)
def error_fn(optim_vars, aux_vars): # returns y - a * exp((-1j*a + b) * x)
x, y = aux_vars
return y.tensor - optim_vars[0].tensor * torch.exp(
(-1j * optim_vars[1].tensor + optim_vars[2].tensor) * x.tensor
)
objective = th.Objective(dtype=torch.complex32)
w = th.ScaleCostWeight(1.0)
w.to(dtype=torch.complex32)
cost_function = th.AutoDiffCostFunction(
[a, b, c], error_fn, y_true.shape[1], aux_vars=[x, y], cost_weight=w
)
objective.add(cost_function)
layer = th.TheseusLayer(th.GaussNewton(objective, max_iterations=10))
phi = torch.nn.Parameter(x_true + 0.1 * torch.ones_like(x_true))
outer_optimizer = torch.optim.Adam([phi], lr=0.001)
input_tensors = {
"x": phi.clone(),
"a": 0.5 * torch.ones(1, 1),
"b": torch.ones(1, 1),
"c": torch.ones(1, 1),
}
input_tensors = {k: t.to(dtype=torch.complex32) for k, t in input_tensors.items()}
for epoch in range(20):
solution, info = layer.forward(
input_tensors=input_tensors,
optimizer_kwargs={"backward_mode": "implicit"},
)
outer_loss1 = torch.nn.functional.mse_loss(solution["a"], a_true)
outer_loss2 = torch.nn.functional.mse_loss(solution["b"], b_true)
outer_loss3 = torch.nn.functional.mse_loss(solution["c"], c_true)
outer_loss = outer_loss1 + outer_loss2 + outer_loss3
outer_loss.backward()
outer_optimizer.step()
print("Outer loss: ", outer_loss.item()) |
Can you share the new version of your script? |
Thanks. The following is the new version, with the added dtype constraints (torch.complex64) import torch import theseus as th def y_model(x, a, b, c): return a * torch.exp((-1j * b + c) * x) # y = a * exp((-1ja + b) * x) def generate_data(num_points=512, a=1, b=2, c=3): data_x = torch.linspace(0, 1, num_points).view(1, -1) data_y = y_model(data_x, a, b, c) return data_x, data_y def read_data(): data_x, data_y_clean = generate_data() return ( data_x, data_y_clean, 1 * torch.ones(1, 1), 2 * torch.ones(1, 1), 3 * torch.ones(1, 1), ) x_true, y_true, a_true, b_true, c_true = read_data() x = th.Variable(torch.randn_like(x_true), name="x") y = th.Variable(y_true, name="y") a = th.Vector(1, name="a") # a manifold subclass of Variable for optim_vars b = th.Vector(1, name="b") # a manifold subclass of Variable for optim_vars c = th.Vector(1, name="c") # a manifold subclass of Variable for optim_vars for v in [x, y, a, b, c]: v.to(dtype=torch.complex64) def error_fn(optim_vars, aux_vars): # returns y - a * exp((-1j*a + b) * x) x, y = aux_vars return y.tensor - optim_vars[0].tensor * torch.exp( (-1j * optim_vars[1].tensor + optim_vars[2].tensor) * x.tensor ) objective = th.Objective(dtype=torch.complex64) w = th.ScaleCostWeight(1.0) w.to(dtype=torch.complex64) cost_function = th.AutoDiffCostFunction( [a, b, c], error_fn, y_true.shape[1], aux_vars=[x, y], cost_weight=w ) objective.add(cost_function) optimizer = th.LevenbergMarquardt( objective, th.CholeskyDenseSolver, max_iterations=10, step_size=0.001, ) layer = th.TheseusLayer(optimizer) phi = torch.nn.Parameter(x_true + 0.1 * torch.ones_like(x_true)) outer_optimizer = torch.optim.Adam([phi], lr=0.001) input_tensors = { "x": phi.clone(), "a": 0.5 * torch.ones(1, 1), "b": torch.ones(1, 1), "c": torch.ones(1, 1), } input_tensors = {k: t.to(dtype=torch.complex64) for k, t in input_tensors.items()} for epoch in range(20): solution, info = layer.forward( input_tensors=input_tensors, optimizer_kwargs={"backward_mode": "implicit"}, ) outer_loss1 = torch.nn.functional.mse_loss(solution["a"], a_true) outer_loss2 = torch.nn.functional.mse_loss(solution["b"], b_true) outer_loss3 = torch.nn.functional.mse_loss(solution["c"], c_true) outer_loss = outer_loss1 + outer_loss2 + outer_loss3 outer_loss.backward() outer_optimizer.step() print("Outer loss: ", outer_loss.item()) |
Hi @zhangrentu. The error in your last comment should now be fixed after #623 is merged. |
I took a quick look at your script. One change I had to make was to set |
I'm able to run the optimizer if I use a very high damping, set the error to return You can see my changes here import torch
import theseus as th
def y_model(x, a, b, c):
return a * torch.exp((-1j * b + c) * x) # y = a * exp((-1ja + b) * x)
def generate_data(num_points=4, a=1, b=2, c=3):
data_x = torch.linspace(0, 1, num_points).view(1, -1)
data_y = y_model(data_x, a, b, c)
return data_x, data_y
def read_data():
data_x, data_y_clean = generate_data()
return (
data_x,
data_y_clean,
1 * torch.ones(1, 1),
2 * torch.ones(1, 1),
3 * torch.ones(1, 1),
)
x_true, y_true, a_true, b_true, c_true = read_data()
x = th.Variable(torch.randn_like(x_true), name="x")
y = th.Variable(y_true, name="y")
a = th.Vector(1, name="a") # a manifold subclass of Variable for optim_vars
b = th.Vector(1, name="b") # a manifold subclass of Variable for optim_vars
c = th.Vector(1, name="c") # a manifold subclass of Variable for optim_vars
for v in [x, y, a, b, c]:
v.to(dtype=torch.complex64)
def error_fn(optim_vars, aux_vars): # returns y - a * exp((-1j*a + b) * x)
x, y = aux_vars
return (
y.tensor
- optim_vars[0].tensor
* torch.exp((-1j * optim_vars[1].tensor + optim_vars[2].tensor) * x.tensor)
).abs()
objective = th.Objective(dtype=torch.complex64)
w = th.ScaleCostWeight(1.0)
w.to(dtype=torch.complex64)
cost_function = th.AutoDiffCostFunction(
[a, b, c],
error_fn,
y_true.shape[1],
aux_vars=[x, y],
cost_weight=w,
autograd_mode="dense",
)
objective.add(cost_function)
optimizer = th.LevenbergMarquardt(
objective,
th.CholeskyDenseSolver,
max_iterations=5,
step_size=0.1,
)
layer = th.TheseusLayer(optimizer)
phi = torch.nn.Parameter(x_true + 0.1 * torch.ones_like(x_true))
outer_optimizer = torch.optim.Adam([phi], lr=1.0)
input_tensors = {
"x": phi.clone(),
"a": 0.5 * torch.ones(1, 1),
"b": torch.ones(1, 1),
"c": torch.ones(1, 1),
}
input_tensors = {k: t.to(dtype=torch.complex64) for k, t in input_tensors.items()}
for epoch in range(20):
solution, info = layer.forward(
input_tensors=input_tensors,
optimizer_kwargs={
"backward_mode": "unroll",
"verbose": True,
"damping": 100.0,
},
)
outer_loss1 = torch.nn.functional.mse_loss(solution["a"].real, a_true)
outer_loss2 = torch.nn.functional.mse_loss(solution["b"].real, b_true)
outer_loss3 = torch.nn.functional.mse_loss(solution["c"].real, c_true)
outer_loss = outer_loss1 + outer_loss2 + outer_loss3
outer_loss.backward()
outer_optimizer.step()
print("Outer loss: ", outer_loss.item()) |
Thank you very much for your response. However, when we use TheseusLayer in the model, it still returns NoneType. Additionally, optim_vars and aux_vars data are automatically placed on Cuda:0, but the objection is on cuda, which causes an error when calling objection.update, making it impossible to use data parallel training (nn.DataParallel). |
We tried changing the error function to the real part, for example, real(exp(ia)) = cos a, but the error did not converge after the modification, or the matrix is non-positive definite, etc. We also attempted to reduce the step_size, which showed a slight improvement. Are there any other adjustment methods? Currently, we are using the following configuration: |
@zhangrentu This code has not been merged to main yet. Are you using the code directly from that branch? If you are, then please share a short snippet of code that results in the device error, because I'm not sure how that can happen in the code from that branch. Thanks! |
Yes, I called it from a branch. The main process of parallel computing is as follows: data is placed on the primary GPU (cuda:0), and the model is distributed to the GPUs used for parallel computing (e.g., cuda:0, cuda:1, cuda:2). Since the TheseusLayer is treated as a layer within the network, it belongs to the model part. However, during parameter updates, the objective of the cost function was not distributed, resulting in the error. The specific error is as shown in the image below: |
Ah, I see. We have never tested this inside a DataParallel model, so I don't have a lot of insight yet. Could you share a short repro script? |
@zhangrentu Regarding the convergence, in the script I shared above one thing I did was to increase the damping to a really large value (I used 100.0), and the error does seem to decrease when doing this. |
I really appreciate your response. It seems that the complex form is not currently supported. At this point, I conducted a simple real-number parameter estimation experiment; the code is as follows: estimating the exponential parameter in the presence of noise, where y is the noisy signal; my idea is to update y with an external network, and the internal NLLS estimates the parameters given y. However, the results do not converge either. I've tried adjusting the step size of NLLS in the optimizer and the learning rate of the external Adam, but it had no effect. I'm not sure what the reason is.
|
|
Hi @zhangrentu. The problem in this example is that your system is underconstrained, resulting in a Jacobian of rank 2 when you have 3 optimization variables. Note that The system converges if you add one more constraint, for example, make t = th.Vector(tensor=torch.ones(1, 1), name="t")
objective.add(th.Difference(c, t, w, name="c_constraint")) |
I'm sorry for providing an inappropriate example. Thank you again for your prompt responses every time. I currently have two main questions:
|
For this problem, if you have only two optimization parameters, A and B, and you know that A equals B, how would you incorporate this constraint into the equation? |
We don't have yet a principled solver for constrained problems. However, you can use soft penalties with a high cost weight to approximate this constraint, which can work well in many cases. If A is an optimization variable but B is not (e.g., a constant target), then you can use Something like (may have some syntax errors) A = th.Vector(...)
B = th.Vector(...)
Z = th.Vector(tensor=torch.zeros(B, d), name=zeros)
cf = th.Between(A, B, Z, th.ScaleCostWeight(100.0), name="a_eq_b_constraint")
obj.add(cf) This adds the constraint |
Thank you for your suggestion. Regarding non-equality constraints, such as imposing non-negativity constraints on optimization variables, do you have any recommendations for effective penalty functions? |
In this case you can use ![]() For a non-negativity constraint, you could use something like Hope this helps. Do note that we haven't tested this cost functon as extensively as others, so please let us know if you have any feedback. |
Hi, do you use theseus successfully in parallel network (torch.nn.DataPatallel)? |
I'm very sorry for the late reply. It seems that the current library does not support parallel training. One possible reason could be the inconsistency of the data distribution across devices during parallel training tasks on the primary GPU.
|
Dear Luis:
I hope this email finds you well. I wanted to reach out to you regarding your excellent work on the topic of Theseus. I have a query regarding any constraints that may be applicable to two optimization variables, A and B. Specifically, I am interested in finding constraints that satisfy the condition A > B > 0. I was wondering if you could provide any insights or suggestions in this regard.
Your expertise and guidance would be highly appreciated. Thank you in advance for your time and assistance. I look forward to hearing from you soon.
Best regards,
Zhangren Tu
…-----原始邮件-----
发件人:"Luis Pineda" ***@***.***>
发送时间:2023-12-22 14:46:10 (星期五)
收件人: facebookresearch/theseus ***@***.***>
抄送: zhangrentu ***@***.***>, Mention ***@***.***>
主题: Re: [facebookresearch/theseus] Failure to fit complex data. (Issue #618)
In this case you can use th.eb.HingeCost, which is defined here. Here is a visual example of what the error looks like when down_limit=-5, up_limit=3 and threshold=0. If threshold is non-zero, its effect is to push both limits towards zero.
image.png (view on web)
For a non-negativity constraint, you could use something like down_limit=0, up_limit=torch.inf, and some threshold to discourage getting too close to zero.
Hope this helps. Do note that we haven't tested this cost functon as extensively as others, so please let us know if you have any feedback.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi @zhangrentu. One option I can think of would be to add a new variable C that represents the difference between A and B. Then you can add the following costs (pseudocode): def c_eq_a_diff_b(optim_vars, aux_vars):
A, B, C = optim_vars
return C.tensor - (A.tensor - B.tensor)
cost_1 = AutodiffCostFunction([A, B, C], c_eq_a_diff_b, C.dof) # C = A - B
cost_2 = HingeCost(C, 0, np.inf, threshold) # C > 0
cost_3 = HingeCost(B, 0, np.inf, threshold) # B > 0 Not sure if this will work well, it might be tricky to optimize properly. |
Thanks, as before, the convergence is not stable. |
❓ Questions and Help
When attempting to fit some simple regression problems, such as y = ax + b, where both x and y are complex numbers, I encountered errors. Could you please advise on methods or modifications to resolve this issue?
![image](https://private-user-images.githubusercontent.com/50004445/276966968-15d3cdb6-8f9d-42e3-bbd0-63dfa940315a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NzQ2MDIsIm5iZiI6MTczODk3NDMwMiwicGF0aCI6Ii81MDAwNDQ0NS8yNzY5NjY5NjgtMTVkM2NkYjYtOGY5ZC00MmUzLWJiZDAtNjNkZmE5NDAzMTVhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDAwMjUwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThhMzUwM2NiYzFmYzI4MjdmZGU4MDgxNDFiYjhjYWE3YjViODBlMjVlY2FlYmU3ZTJiMTE1YzU2YWJkYTA3OTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1biTuF-OD0LfxElFKCEYrIvmBV-JhACcgGsFXYt0Keg)
The text was updated successfully, but these errors were encountered: