Artificial Intelligence

Tue Apr 09, 2024 10:42 am

Os - you can't effectively train these models solely on your own data. It simply isn't enough. They require huge datasets in order to function at the most basic level.

Tue Apr 09, 2024 10:52 am

JM2K6 wrote: ↑Tue Apr 09, 2024 7:42 am
Jethro wrote: ↑Tue Apr 09, 2024 1:35 am
Raggs wrote: ↑Mon Apr 08, 2024 10:50 am

Like I said, I don't think any career goes extinct. But 1 human can oversee 5 AI conversations, only having to intervene here and there, rather than needing humans for each one etc.

Programmers are saying that AI can do a huge amount of coding for them, but can't do everything, making them massively more efficient as they don't have to worry about as much "busy work". It can also scan for bugs/errors faster etc. You'll still need programmers, but nowhere near as many to produce the same volume of work.
Raggs back in the 1960s the claim was the COBOL programming language would allow Managers to write systems and you wouldn't need those pesky nerds down in the basement, that turned out not to be entirely correct.

What they are calling AI now a days is media driven, the term for what we have is "Expert Systems" (yeah not as sexy for sure), or as I like to call it smoke and mirrors (code all possible answers to a question and follow a decision path, it the answer isn't what you want ask the question in a different way till you get an answer your system can interpret using something called "fuzzy logic".

My niece is currently studying AI, and is that ever complex, she reckons a few research orgs have true AI but at a very fundamental level.

The computer apocalypse is quite a way off folks.
I do not know many good programmers saying AI can do a lot of work for them, and I know a lot of programmers

One of my friends is using it somewhat. He's got >30 years programming under his belt and says it can give him a code sample that's 70% correct a little faster than he can do it himself or look it up on stackoverflow. His advantage is that he can spot what is and isn't correct immediately. The rest of his colleagues - all much younger - struggle with the 'spotting what's correct' bit.

Those Zeihan articles were interesting and put some actual numbers on what is the big issue - the inbreeding of training data is only going to get worse and the resources needed to train are only going to get larger. I'm finding too many AI generated webpages that sound convincing for a couple of sentences and then you realise it's complete bollocks.

The bad part in the short term is that a huge amount of investment capital is being pissed up the wall that could be going in to something useful.

Tue Apr 09, 2024 11:12 am

I often wonder what would have happened to Autonomy if it hadn't been sold to HP, they were kind of at the forefront of analysing huge datasets across lots of different media, very definitely a precursor to what is happening now. I bet Mike Lynch regrets selling it!

Tue Apr 09, 2024 11:20 am

epwc wrote: ↑Tue Apr 09, 2024 11:12 am I often wonder what would have happened to Autonomy if it hadn't been sold to HP, they were kind of at the forefront of analysing huge datasets across lots of different media, very definitely a precursor to what is happening now. I bet Mike Lynch regrets selling it!

I think he always wanted the big payday.

Autonomy used Bayesian probabilistic methods (so they claimed, at any rate) which doesn't always require such large datasets. One of their selling points- and I stress selling point, I'm not sure what the actual capability was - was the ability to parse text documents and auto-classify, and some view of doing that for videos and media clips. I only ever saw it done on one of Obama's speeches, and there was an underlying feeling it had been fettled and finessed to get it to work - I'm not convinced there was really a huge amount under the covers.

In their defence, there's always an element of faking before making, so you're right to ask what might have happened had they carried on as a separate entity, but I'm not convinced there was a huge amount of substance behind many of their promoted claims.

Tue Apr 09, 2024 2:35 pm

JM2K6 wrote: ↑Tue Apr 09, 2024 10:42 am Os - you can't effectively train these models solely on your own data. It simply isn't enough. They require huge datasets in order to function at the most basic level.

Lets say there's 1gb of about 500 similar images (the minimum you can get away with) and tolerances have been set to mostly follow that training data, it's pulling from a much larger database that went into making the model to produce variations using one prompt say "expressionist". But how much larger would the data that went into producing the model need to be to have a minimum viable product. I can get a sense of the size of part A but not part B. The model, wouldn't be only examples of expressionists, but also multiple other parts (people, buildings, etc).

Take the TikToks that look suspiciously like "Walking Through Tokyo 4K" Youtube vids, it probably didn't need many of those vids (if it's anything like image generation), does it then need the entire video content of the internet to make part B/the model and nothing less than that? Because if that's the case it's a lot less viable than I thought. My assumption was they were scraping the entire internet because they were attempting to make a generalist magic eight ball, and not that janky 30 second videos required all that.

I appreciate this may be a bit how long is a piece of string.

Tue Apr 09, 2024 3:51 pm

epwc wrote: ↑Tue Apr 09, 2024 11:12 am I often wonder what would have happened to Autonomy if it hadn't been sold to HP, they were kind of at the forefront of analysing huge datasets across lots of different media, very definitely a precursor to what is happening now. I bet Mike Lynch regrets selling it!

I bet he does considering he's staring down a long prison sentence.

Tue Apr 09, 2024 3:52 pm

That's what I mean

Tue Apr 09, 2024 5:33 pm

Sandstorm wrote: ↑Mon Apr 08, 2024 8:16 pm Surely soldiers get replaced by AI before anyone else? Drones are already replacing pilots. You won’t need meat bags in fatigues in the next decade.

Not a hope and drones are not replacing pilots, I don't know where you got that idea.

Tue Apr 09, 2024 6:53 pm

_Os_ wrote: ↑Tue Apr 09, 2024 2:35 pm
JM2K6 wrote: ↑Tue Apr 09, 2024 10:42 am Os - you can't effectively train these models solely on your own data. It simply isn't enough. They require huge datasets in order to function at the most basic level.
Lets say there's 1gb of about 500 similar images (the minimum you can get away with) and tolerances have been set to mostly follow that training data, it's pulling from a much larger database that went into making the model to produce variations using one prompt say "expressionist". But how much larger would the data that went into producing the model need to be to have a minimum viable product. I can get a sense of the size of part A but not part B. The model, wouldn't be only examples of expressionists, but also multiple other parts (people, buildings, etc).

Take the TikToks that look suspiciously like "Walking Through Tokyo 4K" Youtube vids, it probably didn't need many of those vids (if it's anything like image generation), does it then need the entire video content of the internet to make part B/the model and nothing less than that? Because if that's the case it's a lot less viable than I thought. My assumption was they were scraping the entire internet because they were attempting to make a generalist magic eight ball, and not that janky 30 second videos required all that.

I appreciate this may be a bit how long is a piece of string.

With no other data sets available, to train a LLM or similar genAI you're generally talking billions of data points. Many gigabytes, usually terabytes. 500 images alone
will likely get you output that basically looks like one of those images or recognisably just a few mashed together; 500 images as a way to fine tune a model with an existing large data set in the area you're interested in is probably 10x too small but not completely unthinkable.

A lot depends on the quality of the original data set. When we were working on an anomaly detection system a few years ago, we had billions of data points but we simply didn't have enough metadata at the time to genuinely make use of it effectively. And that's one of the simpler methods; anomaly detection is (I believe, but could be wrong) a much easier nut to crack than language or the video/image equivalents. An effective LLM usually has billions of parameters (largely equivalent to the metadata or tags, I guess) and so you need a very large, very broad dataset that can provide all that. ChatGPT-3 had 175 billion parameters, for example.

I don't think there is a simple answer to your question about how much data you need beyond "a hell of a lot". Articles like this one attempt to quantify it to some degree.

As for the scraping of the internet for the videos - my assumption is that they are using some associated text as metadata/parameters along with the video tags, but everything else is probably funelled into separate specific sets for other models. This is all assumption on my part; I know less about the video side, but it's also the one that is most obviously janky and most obviously a solution in search of a solution (remember those videos last year where someone got their AI to replicate movie trailers? Horrific shit, just utterly dreadful and totally useless - but recognisable by humans to a certain extent and so everyone assumed it would be a short step to AI generation of whole movies)

Remember video is just a series of frames, so it's essentially a continuation of the theme: the model makes a guess as to what should come next based on the prompt, the data set, and the parameters within. I expect it's easier in many ways for a model to make that guess, because there is a logical link and progression between frames of video in the data sets that probably doesn't exist for image sets.

It's always instructive to look past the glamour of the product demos and actually look at what's being displayed. The videos are simultaneously the most obviously impressive and yet the most obviously broken and nonsense application of this technology. I simply don't understand who it's for.

Tue Apr 09, 2024 11:42 pm

JM2K6 wrote: ↑Tue Apr 09, 2024 7:42 am
Jethro wrote: ↑Tue Apr 09, 2024 1:35 am
Raggs wrote: ↑Mon Apr 08, 2024 10:50 am

Like I said, I don't think any career goes extinct. But 1 human can oversee 5 AI conversations, only having to intervene here and there, rather than needing humans for each one etc.

Programmers are saying that AI can do a huge amount of coding for them, but can't do everything, making them massively more efficient as they don't have to worry about as much "busy work". It can also scan for bugs/errors faster etc. You'll still need programmers, but nowhere near as many to produce the same volume of work.
Raggs back in the 1960s the claim was the COBOL programming language would allow Managers to write systems and you wouldn't need those pesky nerds down in the basement, that turned out not to be entirely correct.

What they are calling AI now a days is media driven, the term for what we have is "Expert Systems" (yeah not as sexy for sure), or as I like to call it smoke and mirrors (code all possible answers to a question and follow a decision path, it the answer isn't what you want ask the question in a different way till you get an answer your system can interpret using something called "fuzzy logic".

My niece is currently studying AI, and is that ever complex, she reckons a few research orgs have true AI but at a very fundamental level.

The computer apocalypse is quite a way off folks.
I do not know many good programmers saying AI can do a lot of work for them, and I know a lot of programmers

Exactly my point, COBOL probably lead to an increase in programmers not a decrease, though I would have loved to see an Accounting Manager try and do anything with COBOL