References

Adamic et al. (2005). The political blogosphere and the 2004 US election: divided they blog. Proceedings of the 3rd international workshop on Link discovery, pp. 36–43
Alaboudi et al. (2021). An exploratory study of debugging episodes. arXiv preprint arXiv:2105.02162
Argyle et al. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, vol. 31, no. 3, pp. 337–351
Baria et al. (2021). The brain is a computer is a brain: neuroscience‘s internal debate and the social significance of the Computational Metaphor. arXiv preprint arXiv:2107.14042
Beisel et al. (2002). [RETRACTED] Histone methylation by the Drosophila epigenetic transcriptional regulator Ash1. Nature, vol. 419, no. 6909, pp. 857–862
Bender et al. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 5185–5198
Berglund et al. (2023). The reversal curse: Llms trained on“ a is b” fail to learn“ b is a”. arXiv preprint arXiv:2309.12288
Brown et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, vol. 33, pp. 1877–1901
Cabanac et al. (2021). Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals. arXiv preprint arXiv:2107.06751
Callaham et al. (2002). Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. Jama, vol. 287, no. 21, pp. 2847–2850
Castro Torres et al. (2022). North and South: Naming practices and the hidden dimension of global disparities in knowledge production. Proceedings of the National Academy of Sciences, vol. 119, no. 10, pp. e2119373119
Chollet and Francois (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547
Deshpande et al. (2023). Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335
Doshi et al. (2024). Generative AI enhances individual creativity but reduces the collective diversity of novel content. Science Advances, vol. 10, no. 28, pp. eadn5290
Drori et al. (2022). A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences, vol. 119, no. 32, pp. e2123433119
Durmus et al. (2023). Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388
Dziri et al. (2024). Faith and fate: Limits of transformers on compositionality. Advances in Neural Information Processing Systems, vol. 36
Garcia et al. (2024). Artificial intelligence–generated draft replies to patient inbox messages. JAMA Network Open, vol. 7, no. 3, pp. e243201–e243201
Girotra et al. (2023). Ideas are dimes a dozen: Large language models for idea generation in innovation. Available at SSRN 4526071
Ha et al. (2024). Organic or diffused: Can we distinguish human art from ai-generated images?. Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pp. 4822–4836
Hilbert et al. (2011). The world’s technological capacity to store, communicate, and compute information. science, vol. 332, no. 6025, pp. 60–65
Karras and Tero (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv preprint arXiv:1812.04948
Kong et al. (2023). Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702
Kotek et al. (2023). Gender bias and stereotypes in large language models. Proceedings of the ACM collective intelligence conference, pp. 12–24
Kumar et al. (2016). Ask me anything: Dynamic memory networks for natural language processing. International conference on machine learning, pp. 1378–1387
Köpf et al. (2024). Openassistant conversations-democratizing large language model alignment. Advances in Neural Information Processing Systems, vol. 36
Li et al. (2024). Artificial Intelligence awarded two Nobel Prizes for innovations that will shape the future of medicine. NPJ Digital Medicine, vol. 7, no. 1, pp. 336
Lightman et al. (2023). Let’s verify step by step. arXiv preprint arXiv:2305.20050
M. Bran et al. (2024). Augmenting large language models with chemistry tools. Nature Machine Intelligence, pp. 1–11
Matter et al. (2024). Close to Human-Level Agreement: Tracing Journeys of Violent Speech in Incel Posts with GPT-4-Enhanced Annotations. arXiv preprint arXiv:2401.02001
McAleese et al. (2024). Llm critics help catch llm bugs. arXiv preprint arXiv:2407.00215
Morris et al. (n.d.). Levels of AGI for operationalizing progress on the path to AGI, arXiv, 2023. arXiv preprint arXiv:2311.02462
Nasr et al. (2023). Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035
Park et al. (2023). Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th annual acm symposium on user interface software and technology, pp. 1–22
Perkins et al. (2024). Genai detection tools, adversarial techniques and implications for inclusivity in higher education. arXiv preprint arXiv:2403.19148
Porter et al. (2024). AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably. Scientific Reports, vol. 14, no. 1, pp. 26133
Radford and Alec (2018). Improving language understanding by generative pre-training.
Sharma et al. (2023). Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548
Sharma et al. (2024). Facilitating self-guided mental health interventions through human-language model interaction: A case study of cognitive restructuring. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–29
Si et al. (2024). Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers. arXiv preprint arXiv:2409.04109
Sidorkin and Alexander M (2024). Embracing chatbots in higher education: the use of artificial intelligence in teaching, administration, and scholarship.
Sivak et al. (2019). Parents mention sons more often than daughters on social media. Proceedings of the National Academy of Sciences, vol. 116, no. 6, pp. 2039–2041
Stribling et al. (2005). Rooter: A methodology for the typical unification of access points and redundancy.
Unknown Author (2024). Delving into ChatGPT usage in academic writing through excess vocabulary. arXiv preprint arXiv:2406.07016
Villalobos et al. (2024). Will we run out of data? Limits of LLM scaling based on human-generated data. arXiv preprint arXiv:2211.04325, vol. 3
Waswani et al. (2017). Attention is all you need. NIPS
Wei et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, vol. 35, pp. 24824–24837
West et al. (2023). THE GENERATIVE AI PARADOX:“What It Can Create, It May Not Understand”. The Twelfth International Conference on Learning Representations
Wu et al. (2023). Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks. arXiv preprint arXiv:2307.02477
Wu et al. (2024). [RETRACTED] Assessment of the efficacy of alkaline water in conjunction with conventional medication for the treatment of chronic gouty arthritis: A randomized controlled study. Medicine, vol. 103, no. 14, pp. e37589
Wuttke et al. (2024). AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers. arXiv preprint arXiv:2410.01824
Yin et al. (2024). Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance. arXiv preprint arXiv:2402.14531
Zech et al. (2018). Confounding variables can degrade generalization performance of radiological deep learning models. arXiv preprint arXiv:1807.00431
Zhang et al. (2024). [RETRACTED] The three-dimensional porous mesh structure of Cu-based metal-organic-framework-aramid cellulose separator enhances the electrochemical performance of lithium metal anode batteries. Surfaces and Interfaces, vol. 46, pp. 104081
Zheng et al. (2023). Is “A Helpful Assistant” the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts. arXiv preprint arXiv:2311.10054, vol. 8