hosting
servers
infrastructure
docker
updates

Can I?¶

Probably not¶

Chat-gpt¶

Its_MS — Today at 3:27 PM this is a question for you @0xDad do you know how I could start making my own AI? Like where to start and that?

0xDad — Today at 3:28 PM @Its_MS in what sense? Writing the entire stack and training it yourself? That's years of "man-hours" of work.

Its_MS — Today at 3:29 PM training it, because before you had one that would answer you, how could I do that but with my data so that it understood my questions better, and "coded" like me?

0xDad — Today at 3:30 PM Are you rich as fuck or have access to a shitload of data-center grade gpus? (EG the 80GB A100's)

Its_MS — Today at 3:30 PM I mean no and no xD

0xDad — Today at 3:30 PM Well, there's a short answer and a long answer which do you want

Its_MS — Today at 3:31 PM short and sweet xD

0xDad — Today at 3:31 PM You don't stand a chance of making an AI that you can train that'll be worth a damn.

[3:33 PM] The long answer is in order to train an AI you need to load the entire AI into memory on something that can compute either extremely fast or extremely in parallel (eg: gpu's), then you can load the data in and feed it to the algorithms to "train" it. I assume we're talking about "chat" style ai's here.

[3:34 PM] AND you need this hypothetical AI to have enough "parameters" that it can mimic intelligence, that's input and output and middle layers, and normally with chat ai's we're talking about hundreds of billions of parameters for a single instance

[3:34 PM] and when each parameter is multiple bytes long, it explodes with required ram very fast

Its_MS — Today at 3:34 PM so can you explain what your "ai" done, the one that you were tweaking to answer quicker?

0xDad — Today at 3:35 PM all I was doing there was using other pre-trained models (Facebooks LLaMa) and trying to use software that's not entirely shit for inference on a degraded model.

[3:37 PM] so by default we store the tokens with a high level of precision (They're just numbers after all, floating point numbers), and the only chance consumer hardware has of loading this entire model into ram or gpu ram is to throw out all the precision and just use the highest 4 or 8 significant bits

[3:37 PM] 32-bit precision by default iirc

[3:38 PM] as you throw out the precision the model gets "dumber", but also requires far less memory.. 4 bit precision is obviously x8 smaller ram requirement than 32 (edited)

[3:39 PM] the stuff I was trying to do to make it faster was use AVX instructions and c++ code to load and run the model instead of using the standard python slow as fuck stacks

[3:39 PM] also, Inference takes FAR less ram than training, not too sure on the technicals there

0xDad — Today at 3:43 PM "My understanding was that Microsoft built a 10,000 GPU system for open AI to train GPT 3.5 and then gpt4. It's rumoured to have been upgraded to 25,000 GPUs which is currently being used to train GPT 5. So it's GPT 5 that is trained on an order of magnitude more compute than Palm. Gpt 4 is better than Palm but not by that much"

[3:44 PM] That's Nvidia A100 gpu's, trained over the course of ~⅓rd of a year

[3:46 PM] in conclusion, if you find a way to train an AI please share, it would be quite literally a revolutionary product that'd be worth billions.

[3:49 PM] I should mention, the model I was using was also a "stripped down" 7B parameter model. It was trained with a LOT more, 256B or somesuch, then lots of shit just thrown out to make a tinier 7B model (which even that can't be loaded into my 3080 10GB at full resolution) (edited)

0xDad — Today at 3:51 PM final note, AVX512 can make it possible for small - hyper-focused ai's (like one trained on just YOUR data and nothing else) could potentially be trained on and run on consumer hardware.. Good luck getting a CPU that supports it though. Intel offered it for a while and removed it, AMD never did until literally just now with their new threadrippers

[3:52 PM] and in that scenario you'd still need terabytes of ram

[3:53 PM] I think Mac's using their "in-house" chips do support avx512 right now

[3:57 PM] @Its_MS that's all, I hope it helps. if you have any more questions feel free to ask.

0xDad — Today at 4:03 PM LOL MSRP: $27,671.00 for the Nvidia A100's when new, so MS's initial hardware investment on GPU's alone is this number: $276.710.000.. Add in a full server per 8x gpu's to run them, and that 276.71 million price tag keeps going up. (this is an assumption on the number of gpu's per server derived from that cpu's only have a certain number of pcie lanes, eg 64 lanes total on the 3^rd gen Xeon scalables that were probably used when they were built, 8x8=64 so that's the maximum number of gpu's installed as 8x pcie devices per CPU - also assumes networking and storage don't use pcie lanes which they likely do) (edited)

Its_MS — Today at 4:05 PM Fucking hell, yeah all that makes sense and I appreciate you taking the time to write all that

original:¶

Its_MS — Today at 3:27 PM
this is a question for you @0xDad do you know how I could start making my own AI? Like where to start and that?

0xDad — Today at 3:28 PM
@Its_MS in what sense? Writing the entire stack and training it yourself? That's years of "man-hours" of work.

Its_MS — Today at 3:29 PM
training it, because before you had one that would answer you, how could I do that but with my data so that it understood my questions better, and "coded" like me?

0xDad — Today at 3:30 PM
Are you rich as fuck or have access to a shitload of data-center grade gpus? (EG the 80GB A100's)

Its_MS — Today at 3:30 PM
I mean no and no xD

0xDad — Today at 3:30 PM
Well, there's a short answer and a long answer which do you want

Its_MS — Today at 3:31 PM
short and sweet xD

0xDad — Today at 3:31 PM
You don't stand a chance of making an AI that you can train that'll be worth a damn.
[3:33 PM]
The long answer is in order to train an AI you need to load the entire AI into memory on something that can  compute either extremely fast or extremely in parallel (eg: gpu's), then you can load the data in and feed it to the algorithms to "train" it. I assume we're talking about "chat" style ai's here.
[3:34 PM]
AND you need this hypothetical AI to have enough "parameters" that it can mimic intelligence, that's input and output and middle layers, and normally with chat ai's we're talking about hundreds of billions of parameters for a single instance
[3:34 PM]
and when each parameter is multiple bytes long, it explodes with required ram very fast

Its_MS — Today at 3:34 PM
so can you explain what your "ai" done, the one that you were tweaking to answer quicker?

0xDad — Today at 3:35 PM
all I was doing there was using other pre-trained models (Facebooks LLaMa) and trying to use software that's not entirely shit for inference on a degraded model.
[3:37 PM]
so by default we store the tokens with a high level of precision (They're just numbers after all, floating point numbers), and the only chance consumer hardware has of loading this entire model into ram or gpu ram is to throw out all the precision and just use the highest 4 or 8 significant bits
[3:37 PM]
32-bit precision by default iirc
[3:38 PM]
as you throw out the precision the model gets "dumber", but also requires far less memory.. 4 bit precision is obviously x8 smaller ram requirement than 32 (edited)
[3:39 PM]
the stuff I was trying to do to make it faster was use AVX instructions and c++ code to load and run the model instead of using the standard python slow as fuck stacks
[3:39 PM]
also, Inference takes FAR less ram than training, not to sure on the technicals there

0xDad — Today at 3:43 PM
"My understanding was that Microsoft built a 10,000 GPU system for open AI to train GPT 3.5 and then gpt4. It's rumoured to have been upgraded to 25,000 GPUs which is currently being used to train GPT 5. So it's GPT 5 that is trained on an order of magnitude more compute than Palm. Gpt 4 is better than Palm but not by that much"
[3:44 PM]
That's Nvidia A100 gpu's, trained over the course of ~1/3rd of a year
[3:46 PM]
in conclusion, if you find a way to train an AI please share, it would be quite literally a revolutionary product that'd be worth billions.
[3:49 PM]
I should mention, the model i was using was also a "stripped down" 7B parameter model. It was trained with a LOT more, 256B or somesuch, then lots of shit just thrown out to make a tinier 7B model (which even that can't be loaded into my 3080 10GB at full resloution) (edited)

0xDad — Today at 3:51 PM
final note, AVX512 can make it possible for small - hyper focused ai's (like one trained on just YOUR data and nothing else) could potentially be trained on and run on consumer hardware.. Good luck getting a CPU that supports it though. Intel offered it for a while and removed it, AMD never did until literally just now with their new threadrippers
[3:52 PM]
and in that scenario you'd still need terrabytes of ram
[3:53 PM]
I think mac's using their "in-house" chips do support avx512 right now
[3:57 PM]
@Its_MS that's all, I hope it helps. if you have any more questions feel free to ask.

0xDad — Today at 4:03 PM
LOL MSRP: $27,671.00 for the Nvidia A100's when new, so MS's initial hardware investment on GPU's alone is this number: $276.710.000.. Add in a full server per 8x gpu's to run them, and that 276.71 million price tag keeps going up. (this is an assumption on number of gpu's per server derived from that cpu's only have a certain number of pcie lanes,eg 64 lanes total on the 3rd gen xeon scalables that were probably used when they were built, 8x8=64 so that's the maximum number of gpu's installed as 8x pcie devices per cpu - also assumes networking and storage don't use pcie lanes which they likely do) (edited)

Its_MS — Today at 4:05 PM
Fucking hell, yeah all that makes sense and I appreciate you taking the time to write all that