The Jazz Guitar Chord Dictionary
Reply to Thread Bookmark Thread
Page 3 of 5 FirstFirst 12345 LastLast
Posts 51 to 75 of 105
  1. #51

    User Info Menu

    Quote Originally Posted by omphalopsychos
    I get what you’re saying, but the way the Kaplan paper models scaling doesn’t actually require everything to increase in perfect tandem. The key result (Figure 4) shows that data requirements grow sublinearly relative to model size and compute. So if you make training 10x more efficient, you don’t suddenly need 10x the data to fully convert that into performance gains—something more like 5x, following the observed power-law trends.

    Attachment 120209

    Now, can OpenAI expand their data 10-fold while keeping quality high? That’s actually the more important question. At this point, just dumping in more raw web text doesn’t do much—you get diminishing returns without better filtering. What’s happening now is labs like OpenAI, Google, and Anthropic are curating high-quality datasets, using retrieval-augmented training, and generating synthetic data to expand their effective training corpus. So they don’t necessarily need 10x the raw data—they need smarter ways to use the data they have, and that’s already happening.

    As for whether we’re still on the same scaling curve from 2020 or if we’re seeing diminishing returns—so far, the scaling trends still hold. Every major model release since then (GPT-4, Claude 2, Gemini, DeepSeek-V3) has continued following the same power-law relationships. If we were actually hitting saturation, we’d expect to see performance plateauing even with increasing compute, but that hasn’t happened. The returns are smaller in absolute terms (as expected from the power-law), but they’re still meaningful enough to justify continued scaling.

    The real shift since 2020 isn’t that scaling stopped working—it’s that raw dataset size has become a more constrained factor. That’s why top labs are now optimizing how they use data instead of just throwing more tokens at the problem. So no, we’re not at a fundamental saturation point yet. The scaling laws still apply, and even though we’re further up the curve, we’re not seeing diminishing returns to the point where scaling has stopped being the dominant factor.
    I agree that it is very probable that we aren't testing the curve yet. But I still don't see it as an obvious conclusion.
    can OpenAI expand their data 10-fold while keeping quality high?
    Yeah, this was one of my points. It seems to remain unclear.

  2.  

    The Jazz Guitar Chord Dictionary
     
  3. #52

    User Info Menu

    Quote Originally Posted by omphalopsychos
    I see what you're saying now. You're referring to the training code? There's plenty of code in that repo but it's for the inference module.
    I am not a computer engineer (my father was), however, I've heard from engineers that the code for the DeepSeek model is not open source, only it's operational code is. In layman's terms, they tell you how it runs but not how it works, the model itself is still a black box.

    Quote Originally Posted by Tal_175
    Not sure what you mean by this. Who doesn't collect your personal data when you're online these days?
    And share it with the Chinese Communist Party? Their association with the DeepSeek company's personnel has been confirmed.

  4. #53

    User Info Menu

    Don't necessarily "disagree" in a binary fashion. Neither of us has perfect knowledge. I think it's most likely that in a few months or sooner we'll see another major press release that will counterbalance the current deepseek narrative. I'm not saying this from any political interest, just based on my knowledge of the space.

  5. #54

    User Info Menu

    The inference code DeepSeek-V3/inference/model.py at main * deepseek-ai/DeepSeek-V3 * GitHub actually tells us a lot about how the model works—it’s not a total black box. The model.py file in the repo gives insight into key architectural details like Mixture-of-Experts (MoE), Multi-Head Latent Attention (MLA), rotary positional embeddings, RMSNorm, and quantization methods. This tells us how the model processes inputs, structures computations, and optimizes inference efficiency.

    What’s missing is how it was trained—we don’t have details on the dataset, optimizer, loss function, or hyperparameter tuning. But from the inference code, you can still infer a lot about how the model is structured and what design choices were made. So while we can’t see how DeepSeek was trained, we can see exactly how it runs and what components define its architecture.

  6. #55

    User Info Menu

    Quote Originally Posted by Mick-7
    I am not a computer engineer (my father was), however, I've heard from engineers that the code for the DeepSeek model is not open source, only it's operational code is. In layman's terms, they tell you how it runs but not how it works, the model itself is still a black box.
    There may not be a code that converts raw data into a DeepSeek model. The process of generating a model is not a fully automated, one step solution in my understanding. In this case the code is the "recipe" and the ideas which are discussed in the technical report. I am not an expert in LLM's. I do have a Phd level training in AI and I work as an researcher/software engineer. But my direct experience with neural networks is limited to building a weather forecast system for a senior course project 20 years ago. The network had like a dozen nodes, Lol.

  7. #56

    User Info Menu

    Here is an article explaining the DeepSeek training process. I assume it is based on the technical report:

    How DeepSeek-R1 Was Built; For dummies

  8. #57

    User Info Menu

    Quote Originally Posted by omphalopsychos
    On the other topic of jobs. I don’t think there’s a single clean answer to how AI will reshape labor because it could go in a few different directions at the same time. Yeah, AI could absolutely concentrate power—companies that integrate it the fastest will outcompete everyone else, and if AI keeps making work more efficient, the biggest players will just consolidate even further. But that’s not the only possibility.

    AI isn’t just about job loss—it also changes what work looks like. A lot of automation in the past hasn’t outright replaced jobs, just made workers more productive, shifting the kinds of skills that matter. AI could do the same, mostly handling repetitive tasks while people focus on oversight, decision-making, and things that still need human input. That doesn’t mean no displacement happens, but it’s not a straight line to “all jobs disappear.”

    There’s also the chance AI just makes everything cheaper, which changes the equation completely. If AI can drive down costs in healthcare, legal services, logistics, and other expensive industries, people might not need to work as much to maintain the same standard of living. Maybe that leads to UBI, or maybe it just means work shifts toward smaller, more independent AI-powered businesses instead of everything consolidating into megacorporations.

    And then there’s the fact that AI could create entirely new industries we haven’t even thought of yet. The internet wiped out a lot of traditional jobs but created things like digital content, e-commerce, and gig work that didn’t exist before. AI could do the same—maybe it spawns new kinds of businesses, AI-assisted solo entrepreneurs, or entirely new economic sectors.

    So yeah, AI could absolutely push things toward corporate centralization, but it could also lead to cheaper goods and services, a shift in work rather than full-on replacement, and new industries altogether. Which way it goes depends less on the tech itself and more on how businesses, governments, and society react to it.
    What have you seen to make you think they will do anything but pocket the extra productivity generated by AI? Good and services will be cheaper to provide, but the costs will go up for the end user. Profit over progress has been the corporate model for decades. AI isn’t going to change that.

    You don’t get 3,000 billionaires by sharing increased productivity.

  9. #58

    User Info Menu

    Idk history doesn’t really support the idea that technological progress just consolidates power without also creating new opportunities. The Industrial Revolution is a good example—yes, factory owners captured a lot of the gains, but industrialization also created an entirely new class of workers by giving landless peasants access to steady wages as machine operators. As subsistence farming declined and local trade became less viable, factory work provided economic stability at scale in a way that didn’t exist before. AI could do something similar—not by creating assembly lines, but by lowering the skill barrier for certain types of knowledge work, making economic participation possible for more people, even if the transition is disruptive.

    You can’t assume one outcome is inevitable. Every major technological shift plays out as a struggle between competing forces—corporate interests, labor, government intervention, and shifts in consumer demand. Technology might centralize power in some areas, but it will also create new industries and new forms of work that aren’t obvious yet, just like industrialization did.

  10. #59

    User Info Menu

    I like that perspective, hopeful.

  11. #60

    User Info Menu

    If DeepSeek was as "open source" as some here suggest, its release would not have spooked the markets and AI industry, they could quickly figure it out. There'd be no fuss about it, just another over-hyped start-up, not to mention that Chinese companies often present opaque (or outright fake) business and financial profiles, which is a big reason why experienced stock traders I know won't touch Chinese company stocks.

    The Italian government just blocked access to DeepSeek in their country:
    Italy blocks access to the Chinese AI application DeepSeek to protect users' data | AP News

    DeepSeek issued a statement saying: "European legislation does not apply to them." Good luck with that argument.

  12. #61

    User Info Menu

    AI vs Human...

    Found DeepSeek useful for finding information on converting USB 5V to center negative 9V for guitar pedals. It may prove useful for other small things in the future.

    What the tool could not tell me is that it made more sense to simply adapt to available pedalboard batteries. So I've procured the $30 Horse brand battery/pedal board power supply mentioned earlier in the thread. Build quality is better than expected though we'll find out about things like isolation and shielding over time.

    To adapt I kept my pedals to 100mA or less with a single pedal pulling 300mA. This will get me to reverb, looper, preamp emulation, and a robust selection of cabinet IR's. Can't use the Strymon's but that's OK. EV HoF and Joyo Cab Box are certainly good enough. So is a simple looper.

    How long will this setup go on battery power? We'll see. In theory more hours than I need.

  13. #62
    djg
    djg is offline

    User Info Menu

    Quote Originally Posted by omphalopsychos
    Idk history doesn’t really support the idea that technological progress just consolidates power without also creating new opportunities. The Industrial Revolution is a good example—yes, factory owners captured a lot of the gains, but industrialization also created an entirely new class of workers by giving landless peasants access to steady wages as machine operators.
    i think the better question is whether this progress will lead to less or more inequality.

    and if 150 years ago this new working class did so well, we wouldnt have the german and russian revolutions, unions, and communism, now would we? the "steady wages" for 6 days of 14-16 hours declined fast as the supply of labor by far outnumbered demand. by the end of the 19th century wages in germany were back to the level of 1820. Berlin was a hell-hole for the working class.

    Social question - Wikipedia

  14. #63
    djg
    djg is offline

    User Info Menu

    Quote Originally Posted by Mick-7
    If DeepSeek was as "open source" as some here suggest, its release would not have spooked the markets and AI industry, they could quickly figure it out.
    it is open source and they have figured it out. it is still bad news for oligarchs like elon musk who just bought 100k nvidia chips

    Quote Originally Posted by Mick-7

    DeepSeek issued a statement saying: "European legislation does not apply to them." Good luck with that argument.
    it applies to them as much as chinese regulation applies to jazzguitar.be

  15. #64

    User Info Menu

    If DeepSeek trades in European markets, it is subject to European law.

  16. #65

    User Info Menu

    Anyone care to weigh in one OpemAI's claim that DeepSeek may be a distillation of ChatGTP? How would that work. For one, thing, the number of parameters is around 0.7 trillion, I believe, whereas ChatGTP's is I think around 2 trillion - so not a huge difference. Also, my (very limited) understanding of distillation is that involves pruning nodes that have the least significance from the 'teaching' network. How could this be done if OpenAI's topology and weights are unavailable?

    (Just wanted to point out that, not only is DeepSeek supposedly much cheaper to train, but also cheaper to make inferences, too.)

    I see the article that Tal linked explains how the training was done, without reference to ChatGTP. So what would prompt OpenAI to claim otherwise? (Aside from self-interest, of course.) Also, at the end of the article, the author states: "We thought model scaling hit a wall, but this approach is unlocking new possibilities". Which implies those power-law graphs from 2020 were no longer applicable, right?

  17. #66

    User Info Menu

    Yikes!


  18. #67

    User Info Menu

    I think Geoffrey Hinton is conflating evolution of behaviour and evolution of intelligence a bit there. Our instincts for survival, reproduction, protection of our offsprings etc has nothing to do with our intelligence. Those instincts give us motivations and goals. They are byproducts of evolutionary processes of biological organisms in particular environmental conditions. Intelligence gives us a way to achieve those goals but not the goals themselves. It's not obvious at all that if you could remove sexual drive and desire to reproduce from an intelligent human, they would develop that desire by reasoning (hence all the nasty competition he was referring to that results from such goals). They would probably be better off without it. It's not clear at all that if you could remove instinct for survival from all humans, they would conclude by reasoning that they are better off alive than not. Intelligence alone does not explain human behaviour, series of random mutations and natural selection do.
    Last edited by Tal_175; 01-31-2025 at 10:32 AM.

  19. #68

    User Info Menu

    Quote Originally Posted by CliffR
    Anyone care to weigh in one OpemAI's claim that DeepSeek may be a distillation of ChatGTP? How would that work. For one, thing, the number of parameters is around 0.7 trillion, I believe, whereas ChatGTP's is I think around 2 trillion - so not a huge difference. Also, my (very limited) understanding of distillation is that involves pruning nodes that have the least significance from the 'teaching' network. How could this be done if OpenAI's topology and weights are unavailable?

    (Just wanted to point out that, not only is DeepSeek supposedly much cheaper to train, but also cheaper to make inferences, too.)

    I see the article that Tal linked explains how the training was done, without reference to ChatGTP. So what would prompt OpenAI to claim otherwise? (Aside from self-interest, of course.) Also, at the end of the article, the author states: "We thought model scaling hit a wall, but this approach is unlocking new possibilities". Which implies those power-law graphs from 2020 were no longer applicable, right?
    I feel like there's not actually enough info to say whether DeepSeek-R1 is a distillation of GPT models because DeepSeek hasn’t released their training code or detailed how they actually trained it. Distillation usually means training a smaller model on the outputs of a larger one, but since OpenAI hasn’t released ChatGPT’s weights or architecture, DeepSeek wouldn’t have been able to do it in the traditional sense. That said, OpenAI has hinted that DeepSeek might have used its models in ways that violate their terms of service, which could mean some kind of training on GPT-generated data—but without more transparency from DeepSeek, it’s just speculation.

    As for the scaling laws, that Vellum article saying “we thought model scaling hit a wall” is probably talking about practical constraints like cost and training time, not the actual power-law trends. The scaling laws still hold—larger models trained on more compute and data continue to improve, but brute-force scaling alone is becoming less practical. This is actually consistent with the Kaplan paper because it showed that you can’t just scale model size and compute without also scaling data or you hit diminishing returns.

  20. #69

    User Info Menu

    Quote Originally Posted by CliffR
    Yikes!


    Annoying clickbaity thumbnail, but I think GH is making good points. Nobody really knows how to regulate these models yet, and current attempts (like the EU AI Act and some U.S. state laws) seem premature. Regulators don’t understand the tech and the tech itself is evolving too fast for rigid rules to work.

  21. #70

    User Info Menu

    Reading a lot of interesting stuff I was curious about in this thread. Thanks to those taking the time to keep us abreast and for putting comments in terms we can all understand. Things are moving fast and frankly, I'm really impressed with the knowledge some of you possess.

    Just wanted to add that what I'm hearing from friends still working in tech is that it's becoming something of a key and marketable skill to know exactly how to phrase your question and in what sequence to place questions when working with these tools. Key to using them to write code of course but also important in seeking more complex answers to more complex questions. Probably germane when drilling into music theory or even jazz history.

  22. #71

    User Info Menu

    Quote Originally Posted by Spook410
    Reading a lot of interesting stuff I was curious about in this thread. Thanks to those taking the time to keep us abreast and for putting comments in terms we can all understand. Things are moving fast and frankly, I'm really impressed with the knowledge some of you possess.

    Just wanted to add that what I'm hearing from friends still working in tech is that it's becoming something of a key and marketable skill to know exactly how to phrase your question and in what sequence to place questions when working with these tools. Key to using them to write code of course but also important in seeking more complex answers to more complex questions. Probably germane when drilling into music theory or even jazz history.
    100%. I always encourage scientists and engineers to integrate these tools into their work, and there’s a real skill to using them effectively. Some people are objectively better at it than others, and that difference creates marketable skills. The first is being able to provide the right direction to the AI to get the best possible output. If you look back at the discussion between me and CliffR a few weeks ago, you’ll see how much variance there was in the results we each got when trying to get AI to write a program—it’s not as simple as just asking and getting a perfect answer. The second skill is validating the output. AI can produce garbage, and if you don’t have a QA mechanism in place, you can easily end up with something that looks correct but isn’t. For subjective/qualitative content, that means actually reading and making sure the output makes sense. For code or anything more structured, you need a proper testing framework to confirm accuracy.

    I think this also ties back to the broader point about employment. It’s not like I’m firing engineers because we have AI—but I do expect them to know how to use AI to multiply their effectiveness. And if I’m hiring, I’d prefer someone who’s familiar with this toolset. It’s becoming just like knowing the latest programming languages, frameworks, or standard workplace tools—not something that replaces people, but something that changes what’s expected from them.

  23. #72

    User Info Menu

    Quote Originally Posted by djg
    GPT3omini has just been released.

    prompt: write a html page with a working piano keyboard. 3 octaves. nice piano sound.

    both results leave room for improvement. i prefer the deepseek version. but for one shot both are impressive. i remember claude and earlier gpts struggling with this task.

    Will keep you posted when I hear back.
    Using DeepSeek for Guitar Pedal Power Tech Question-screenshot-2025-01-31-3-38-51 pm-jpg
    Last edited by omphalopsychos; 02-01-2025 at 02:34 AM.

  24. #73
    djg
    djg is offline

    User Info Menu

    Quote Originally Posted by omphalopsychos
    Will keep you posted when I hear back.
    Using DeepSeek for Guitar Pedal Power Tech Question-screenshot-2025-01-31-3-38-51 pm-jpg
    would you even need ai for that? for a software that is basically 36 years old BIAB does an impressive job creating jazz solos. iirc the soloist function can be edited to play short phrases with fixed rhythms. so it could create contrafacts without any ai involved, right? and with ai and real tracks one could build a nice contrafact generator?

  25. #74

    User Info Menu

    Update: still sucks at music

  26. #75

    User Info Menu

    Quote Originally Posted by djg
    would you even need ai for that? for a software that is basically 36 years old BIAB does an impressive job creating jazz solos. iirc the soloist function can be edited to play short phrases with fixed rhythms. so it could create contrafacts without any ai involved, right? and with ai and real tracks one could build a nice contrafact generator?
    No you don't "need" AI for it. But it's a test for AI to do it without guardrails/constraints.