[Question] DPO Implementation

For those that have implemented DPO: When you start the training process, do your reference model and your policy model need to have different initializations? The reason I ask is because it seems like if they are the same model to start, then the log probs will be 0 making the initial loss 0 and preventing any update from occurring. Am I missing something?

For those that have implemented DPO: When you start the training process, do your reference model and your policy model need to have different initializations? The reason I ask is because it seems like if they are the same model to start, the log probs will be 0 making the initial loss 0, preventing any update from occurring. Am I missing something?

I don't think there's a distinction. Neither title is defined well though so who knows.

MLE is a very poorly defined label so that's understandable. It can range from anything to someone implementing ML infrastructure to an applied researcher. I would classify them broadly as the people who implement machine learning for a product/feature. This usually entails the whole life cycle from preparing data, choosing a model and designing a system/software around it, training the model, and integrating this into a product. Thus, they need to be proficient in ML and understand the field and the state of the art, and also have some SWE skills. In my opinion a master's is a really good level of education for this type of role.

I'm confused what your point is. The people that are qualified will get the jobs and the people that aren't won't. Most of the former will have graduate degrees. Even the "machine learning engineers" (why are you bashing this role while claiming to not know much about ML? It still requires a solid depth of knowledge and to be able to understand current research).

It seems like the first half of the courses are pretty similar (topic-wise at least, cant speak on difficulty/depth of problem sets).

The second half of 4220 (topics on optimization) seems a bit more practical than the second half of 6210.

I haven't taken 4220, but after looking at the syllabus I think I'd recommend it over 6210 for most people tbh. Also 6210 is very assignment based when Damle is teaching it.

Yeah, I was just saying that because rigorous academics may hurt the "college experience" aspect since people are so busy studying.

Looking over the syllabus of 4220, I would say not a lot unless you want to do research in NLA.

Given the classified nature of defense work, the fact that this is a US government board, and the increased importance of ML in defense these days, it makes a lot of sense to me.

Two examples of what I believe is the SOTA multimodal pretraining technique is in the Llava paper and the Qwen Audio paper. Essentially, they freeze the LLM during pre-training, and create an encoder that encodes the stuff other than text into the frozen LLMs input space. Then the LLM is finetuned on multimodal instructions. This way the LLM can "understand" multimodal data without forgetting its text understanding.

CS 5414 is a huge time sink. Proceed with caution if the rest of your schedule is tough. The content seems more geared towards someone looking to do large scale distributed systems R&D work so I'm not sure how useful it is to a SWE or someone in quant finance. I work in an ML role and not in the above fields so I could definitely be wrong about that.

I would recommend something like 5220 (HPC) over 5414.

6210 is another class I enjoyed, but probably better for people looking to solidify their linear algebra skills.

Hi,

I'm looking to hire someone to help with audio captioning. The ideal candidate has extensive audio engineering and music theory knowledge. If you're interested please message me so we can talk compensation, captioning directions, volume of data (this is flexible), and your qualifications. You'll also get free samples if you so desire due to the nature of the task.

Wildflower preserve parking is the main spot

Would definitely be interested in seeing some recent learning theory work here.