{ "Summary": "The paper investigates the relationship between Minimal Description Length (MDL) and the phenomenon of grokking in neural networks. It introduces a novel MDL estimation technique based on weight pruning and applies it to modular arithmetic and permutation tasks. The findings suggest a strong correlation between MDL reduction and improved generalization, particularly in grokking scenarios.", "Strengths": [ "Novel approach linking MDL and grokking.", "Comprehensive experimental setup with diverse datasets.", "Strong empirical evidence supporting the hypothesis.", "Clear visualization and analysis of results." ], "Weaknesses": [ "Limited novelty in the MDL estimation technique, as weight pruning is a well-known method.", "Experiments are mostly on simple datasets; performance on complex tasks is poor.", "Moderate clarity; some sections need better organization and explanation.", "Limited significance due to narrow scope and questionable robustness in complex scenarios.", "Potential negative societal impacts and limitations are not thoroughly discussed." ], "Originality": 2, "Quality": 2, "Clarity": 2, "Significance": 2, "Questions": [ "How does the MDL estimation technique theoretically justify its effectiveness?", "Can the approach be generalized to more complex tasks or larger datasets?", "What are the potential negative societal impacts of this approach?", "Can the authors provide more details on how the pruning threshold (\u03b5) was chosen?", "How does the MDL estimation technique perform on more complex datasets beyond modular arithmetic and permutation?", "Can the authors discuss potential improvements or alternatives to the weight pruning method for MDL estimation?", "What are the broader implications of this work for other types of neural network architectures?", "Can the authors clarify the steps for reproducing the experiments and provide code or supplementary material?", "How does the proposed method compare to other existing approaches for predicting generalization in neural networks?", "Could the authors provide more insights into why the method struggled with the permutation task and suggest potential ways to address this limitation?" ], "Limitations": [ "The application of the technique is limited to modular arithmetic and permutation tasks.", "The theoretical foundation of the MDL estimation technique needs further clarification.", "Potential negative societal impacts are not discussed.", "The experiments are limited to relatively simple datasets, raising concerns about the generalizability of the findings.", "The pruning-based MDL estimation technique may not be robust for more complex tasks.", "The paper does not sufficiently address the limitations of the MDL estimation technique and its applicability to more complex scenarios.", "Potential negative societal impacts and ethical considerations related to the proposed methods are not discussed.", "Some key metrics and figures (e.g., 'Figure ??') are missing or not clearly referenced in the text." ], "Ethical Concerns": false, "Soundness": 2, "Presentation": 2, "Contribution": 2, "Overall": 4, "Confidence": 4, "Decision": "Reject" }