Abstract

Building an Artificial Intelligence (AI) agent that can design on its own has been a goal since the 1980s. Recently, deep learning has shown the ability to learn from large-scale data, enabling significant advances in data-driven design. However, learning over prior data limits us only to solve problems that have been solved before and biases data-driven learning toward existing solutions. The ultimate goal for a design agent is the ability to learn generalizable design behavior in a problem space without having seen it before. We introduce a self-learning agent framework in this work that achieves this goal. This framework integrates a deep policy network with a novel tree search algorithm, where the tree search explores the problem space, and the deep policy network leverages self-generated experience to guide the search further. This framework first demonstrates an ability to discover high-performing generative strategies without any prior data, and second, it illustrates a zero-shot generalization of generative strategies across various unseen boundary conditions. This work evaluates the effectiveness and versatility of the framework by solving multiple versions of two engineering design problems without retraining. Overall, this paper presents a methodology to self-learn high-performing and generalizable problem-solving behavior in an arbitrary problem space, circumventing the need for expert data, existing solutions, and problem-specific learning.

References

1.
Brown
,
D. C.
, and
Chandrasekaran
,
B.
,
2014
,
Design Problem Solving: Knowledge Structures and Control Strategies
,
Morgan Kaufmann, Elsevier Science
,
Burlington, MA
.
2.
Simon
,
H. A.
,
2019
,
The Sciences of the Artificial
,
MIT Press
,
Cambridge, MA
.
3.
Chakrabarti
,
A.
,
Shea
,
K.
,
Stone
,
R.
,
Cagan
,
J.
,
Campbell
,
M.
,
Hernandez
,
N. V.
, and
Wood
,
K. L.
,
2011
, “
Computer-Based Design Synthesis Research: An Overview
,”
ASME J. Comput. Inf. Sci. Eng.
,
11
(
2
), p.
021003
.
4.
Regenwetter
,
L.
,
Nobari
,
A. H.
, and
Ahmed
,
F.
,
2022
, “
Deep Generative Models in Engineering Design: A Review
,”
ASME J. Mech. Des.
,
144
(
7
), p.
071704
.
5.
Schrittwieser
,
J.
,
Antonoglou
,
I.
,
Hubert
,
T.
,
Simonyan
,
K.
,
Sifre
,
L.
,
Schmitt
,
S.
,
Guez
,
A.
, et al
,
2020
, “
Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model
,”
Nature
,
588
(
7839
), pp.
604
609
.
6.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
,
Graves
,
A.
, et al
,
2015
, “
Human-Level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.
7.
Brown
,
N.
, and
Sandholm
,
T.
,
2018
, “
Superhuman AI for Heads-Up No-Limit Poker: Libratus Beats Top Professionals
,”
Science
,
359
(
6374
), pp.
418
424
.
8.
Vinyals
,
O.
,
Babuschkin
,
I.
,
Czarnecki
,
W. M.
,
Mathieu
,
M.
,
Dudzik
,
A.
,
Chung
,
J.
,
Choi
,
D. H.
, et al
,
2019
, “
Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning
,”
Nature
,
575
(
7782
), pp.
350
354
.
9.
Kahneman
,
Daniel
,
2011
,
Thinking Fast and Slow
,
Farrar, Straus and Giroux
,
New York
.
10.
Graepel
,
T.
,
2016
, “AlphaGo—Mastering the Game of Go With Deep Neural Networks and Tree Search,” Lecture Notes Computer Science, 9852 LNAI, p. XXI.
11.
Anthony
,
T.
,
Tian
,
Z.
, and
Barber
,
D.
,
2017
, “
Thinking Fast and Slow with Deep Learning and Tree Search
,”
Adv. Neural Inf. Process. Syst.
,
2017
, pp.
5361
5371
.
12.
Li
,
Z.
,
Chen
,
Q.
, and
Koltun
,
V.
,
2018
, “
Combinatorial Optimization With Graph Convolutional Networks and Guided Tree Search
,”
Advances in Neural Information Processing Systems
,
Montreal, Canada
,
Sept. 8
, pp.
539
548
.
13.
Gaymann
,
A.
, and
Montomoli
,
F.
,
2019
, “
Deep Neural Network and Monte Carlo Tree Search Applied to Fluid-Structure Topology Optimization
,”
Sci. Rep.
,
9
(
1
), p.
15916
.
14.
Raina
,
A.
,
Cagan
,
J.
, and
McComb
,
C.
,
2022
, “
Design Strategy Network: A Deep Hierarchical Framework to Represent Generative Design Strategies in Complex Action Spaces
,”
ASME J. Mech. Des.
,
144
(
2
), p.
021404
.
15.
Hubert
,
T.
,
Schrittwieser
,
J.
,
Antonoglou
,
I.
,
Barekatain
,
M.
,
Schmitt
,
S.
, and
Silver
,
D.
,
2021
, “
Learning and planning in complex action spaces
,”
International Conference on Machine Learning
,
Virtual online
,
July 18
.
16.
Hausknecht
,
M.
, and
Stone
,
P.
,
2016
, “
Deep Reinforcement Learning in Parameterized Action Space
,”
The 4th International Conference on Learning Representations, ICLR 2016—Conference Track Proceedings
17.
Lee
,
K.
,
Kim
,
S. A.
,
Choi
,
J.
, and
Lee
,
S. W.
,
2018
, “
Deep Reinforcement Learning in Continuous Action Spaces: A Case Study in the Game of Simulated Curling
,”
Proceedings of the 35th International Conference on Machine Learning, ICML 2018
,
Stockholmsmässan, Stockholm, Sweden
,
July 10
,
J.
Dy
and
A.
Krause
, eds., PMLR, pp.
4587
4596
.
18.
Cagan
,
J.
, and
Mitchell
,
W. J.
,
1993
, “
Optimally Directed Shape Generation by Shape Annealing
,”
Environ. Plan. B Plan. Des.
,
20
(
1
), pp.
5
12
.
19.
Renner
,
G.
, and
Ekárt
,
A.
,
2003
, “
Genetic Algorithms in Computer Aided Design
,”
CAD Comput. Aided Des.
,
35
(
8 SPEC.
), pp.
709
726
.
20.
Campbell
,
M. I.
,
Rai
,
R.
, and
Kurtoglu
,
T.
,
2012
, “
A Stochastic Tree-Search Algorithm for Generative Grammars
,”
ASME J. Comput. Inf. Sci. Eng.
,
12
(
3
), p.
031006
.
21.
Kumar
,
M.
,
Campbell
,
M. I.
,
Königseder
,
C.
, and
Shea
,
K.
,
2012
, “
Rule Based Stochastic Tree Search
,”
Design Computing and Cognition
,
Houston, TX
,
June 5
, Springer Netherlands, Dordrecht, pp.
571
587
.
22.
Short
,
A.-R.
,
DuPont
,
B. L.
, and
Campbell
,
M. I.
,
2018
, “
A comparison of tree search methods for graph topology design problems
,”
Design Computing and Cognition
,
Milan, Italy
,
July 2
, Springer International Publishing, Cham, pp.
75
94
.
23.
Short
,
A. R.
,
Dutta
,
P.
,
Gorr
,
B.
,
Bedrosian
,
L.
, and
Selva
,
D.
,
2022
, “
Representing and Analyzing Sequential Satellite Mission Design Decisions Through Anisomorphic Trees and Directed Graphs
,”
AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022
,
San Diego, CA
,
Jan. 3
.
24.
Chen
,
W.
, and
Ahmed
,
F.
,
2021
, “
MO-PaDGAN: Reparameterizing Engineering Designs for Augmented Multi-Objective Optimization
,”
Appl. Soft Comput.
,
113
(
A
), p.
107909
.
25.
Chen
,
W.
,
Chiu
,
K.
, and
Fuge
,
M. D.
,
2019
, “
Aerodynamic Design Optimization and Shape Exploration Using Generative Adversarial Networks
,”
AIAA Scitech 2019 Forum
,
San Diego, CA
,
Jan. 7
.
26.
Zhang
,
W.
,
Yang
,
Z.
,
Jiang
,
H.
,
Nigam
,
S.
,
Yamakawa
,
S.
,
Furuhata
,
T.
,
Shimada
,
K.
, and
Kara
,
L. B.
,
2019
, “
3D Shape Synthesis for Conceptual Design and Optimization Using Variational Autoencoders
,”
Proceedings of the ASME Design Engineering Technical Conference, 2A-2019
,
Anaheim, CA
,
Aug. 18
.
27.
Ruiz-Montiel
,
M.
,
Boned
,
J.
,
Gavilanes
,
J.
,
Jiménez
,
E.
,
Mandow
,
L.
, and
Pérez-De-La-Cruz
,
J. L.
,
2013
, “
Design With Shape Grammars and Reinforcement Learning
,”
Adv. Eng. Informatics
,
27
(
2
), pp.
230
245
.
28.
Liao
,
H.
,
Zhang
,
W.
,
Dong
,
X.
,
Poczos
,
B.
,
Shimada
,
K.
, and
Burak Kara
,
L.
,
2020
, “
A Deep Reinforcement Learning Approach for Global Routing
,”
ASME J. Mech. Des.
,
142
(
6
), p.
061701
.
29.
Lee
,
X. Y.
,
Balu
,
A.
,
Stoecklein
,
D.
,
Ganapathysubramanian
,
B.
, and
Sarkar
,
S.
,
2019
, “
A Case Study of Deep Reinforcement Learning for Engineering Design: Application to Microfluidic Devices for Flow Sculpting
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111401
.
30.
Browne
,
C. B.
,
Powley
,
E.
,
Whitehouse
,
D.
,
Lucas
,
S. M.
,
Cowling
,
P. I.
,
Rohlfshagen
,
P.
,
Tavener
,
S.
,
Perez
,
D.
,
Samothrakis
,
S.
, and
Colton
,
S.
,
2012
, “
A Survey of Monte Carlo Tree Search Methods
,”
IEEE Trans. Comput. Intell. AI Games
,
4
(
1
), pp.
1
43
.
31.
Coulom
,
R.
,
2007
, “Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search,”
Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
,
H. J.
van den Herik
,
P.
Ciancarini
, and
H. H. L. M.
Jeroen Donkers
, eds.,
Springer
,
Berlin/Heidelberg
, pp.
72
83
.
32.
Kocsis
,
L.
, and
Szepesvári
,
C.
,
2006
, “Bandit Based Monte-Carlo Planning,”
Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
,
J.
Fürnkranz
,
T.
Scheffer
, and
M.
Spiliopoulou
, eds.,
Springer
,
Berlin/Heidelberg
, pp.
282
293
.
33.
Świechowski
,
M.
,
Godlewski
,
K.
,
Sawicki
,
B.
, and
Mańdziuk
,
J.
,
2022
, “
Monte Carlo Tree Search: A Review of Recent Modifications and Applications
,”
Artif. Intell. Rev.
34.
Rosin
,
C. D.
,
2011
, “
Multi-Armed Bandits With Episode Context
,”
Ann. Math. Artif. Intell.
,
61
(
3
), pp.
203
230
.
35.
Chaslot
,
G.
,
Bakkes
,
S.
,
Szitai
,
I.
, and
Spronck
,
P.
,
2008
, “
Monte-Carlo Tree Search: A New Framework for Game AI1
,”
Proceedings of the Belgian/Netherlands Artificial Intelligence Conference
,
Belgium
,
Oct. 15
, pp.
389
390
.
36.
Zhu
,
J.
,
Johnson
,
B.
,
Bangalore
,
P.
,
Huddleston
,
D.
, and
Skjellum
,
A.
,
1998
, “
On the Parallelization of CH3D
,”
International Water Resources Engineering Conference—Proceedings
,
Memphis, TN
,
Aug. 3
, pp.
1108
1113
.
37.
Liu
,
A.
,
Chen
,
J.
,
Yu
,
M.
,
Zhai
,
Y.
,
Zhou
,
X.
, and
Liu
,
J.
,
2018
, “
Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search
,”
International Conference on Learning Representations
,
Addis Ababa, Ethiopia (virtual)
,
Apr. 26
, pp.
1
21
.
38.
Cou
,
A.
,
Hoock
,
J.
,
Sokolovska
,
N.
,
Teytaud
,
O.
,
Cou
,
A.
,
Hoock
,
J.
,
Sokolovska
,
N.
, al.,
2011
, “
Continuous Upper Confidence Trees To Cite This Version : HAL Id : Hal-00542673 Continuous Upper Confidence Trees
,”
Proceedings of the 5th International Conference on Learning and Intelligent Optimization
,
Italy
,
January
, pp.
433
445
.
39.
Moerland
,
T. M.
,
Broekens
,
J.
,
Plaat
,
A.
, and
Jonker
,
C. M.
,
2018
, “
A0C: Alpha Zero in Continuous Action Space
,”
40.
Rybkin
,
O.
,
Pertsch
,
K.
,
Derpanis
,
K. G.
,
Daniilidis
,
K.
, and
Jaegle
,
A.
,
2019
, “
Learning What You Can Do Before Doing Anything
,”
Proceedings of the 7th International Conference on Learning Representations, ICLR 2019
,
New Orleans, LA
,
May
.
41.
Hubert
,
T.
,
Schrittwieser
,
J.
,
Antonoglou
,
I.
,
Barekatain
,
M.
,
Schmitt
,
S.
, and
Silver
,
D.
,
2021
, “
Learning and Planning in Complex Action Spaces
,” ArXiv, abs/2104.0.
42.
Silver
,
D.
,
Schrittwieser
,
J.
,
Simonyan
,
K.
,
Antonoglou
,
I.
,
Huang
,
A.
,
Guez
,
A.
,
Hubert
,
T.
, et al
,
2017
, “
Mastering the Game of Go Without Human Knowledge
,”
Nature
,
550
(
7676
), pp.
354
359
.
43.
Guo
,
X.
,
Singh
,
S.
,
Lee
,
H.
,
Lewis
,
R.
, and
Wang
,
X.
,
2014
, “Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning,”
Advances in Neural Information Processing Systems
,
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N.
Lawrence
, and
K. Q.
Weinberger
, eds.,
Curran Associates, Inc.
,
Red Hook, NY
, pp.
3338
3346
.
44.
Schwarzschild
,
A.
,
Borgnia
,
E.
,
Gupta
,
A.
,
Huang
,
F.
,
Vishkin
,
U.
,
Goldblum
,
M.
, and
Goldstein
,
T.
,
2021
, “
Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks
,”
35th Conference on Neural Information Processing Systems
,
Virtual only
,
December
.
45.
Zhao
,
A.
,
Xu
,
J.
,
Konaković-Luković
,
M.
,
Hughes
,
J.
,
Spielberg
,
A.
,
Rus
,
D.
, and
Matusik
,
W.
,
2020
, “
RoboGrammar: Graph Grammar for Terrain-Optimized Robot Design
,”
ACM Trans. Graph.
,
39
(
6
), pp.
1
16
.
46.
Mirhoseini
,
A.
,
Goldie
,
A.
,
Yazgan
,
M.
,
Jiang
,
J. W.
,
Songhori
,
E.
,
Wang
,
S.
,
Lee
,
Y. J.
, et al
,
2021
, “
A Graph Placement Methodology for Fast Chip Design
,”
Nature
,
594
(
7862
), pp.
207
212
.
47.
He
,
Y.
, and
Bao
,
F. S.
,
2020
, “
Circuit Routing Using Monte Carlo Tree Search and Deep Neural Networks
,”
International Symposium on VLSI Design, Automation and Test (VLSI-DAT)
,
Taiwan
,
April
.
48.
Chang
,
K. W.
,
Krishnamurthy
,
A.
,
Agarwal
,
A.
,
Daumé
,
H.
, and
Langford
,
J.
,
2015
, “
Learning to Search Better Than Your Teacher
,”
Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, JMLR.org
,
Lille, France
,
July
, pp.
2058
2066
.
49.
Grill
,
J. B.
,
Altché
,
F.
,
Tang
,
Y.
,
Hubert
,
T.
,
Valko
,
M.
,
Antonoglou
,
I.
, and
Munos
,
R.
,
2020
, “
Monte-Carlo Tree Search as Regularized Policy Optimization
,”
Proceedings of the 37th International Conference on Machine Learning, ICML 2020
,
Vienna, Austria
,
July
, PMLR, pp.
3727
3736
.
50.
Ross
,
S.
,
Gordon
,
G. J.
, and
Bagnell
,
J. A.
,
2011
, “
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
,”
J. Mach. Learn. Res.
,
15
(
1
), pp.
627
635
.
51.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
1998
,
Reinforcement Learning: An Introduction
,
MIT Press
,
Cambridge, MA
.
52.
Raina
,
A.
,
McComb
,
C.
, and
Cagan
,
J.
,
2019
, “
Learning to Design From Humans: Imitating Human Designers Through Deep Learning
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111102
.
53.
Raina
,
A.
,
Puentes
,
L.
,
Cagan
,
J.
, and
McComb
,
C.
,
2021
, “
Goal-Directed Design Agents: Integrating Visual Imitation With One-Step Lookahead Optimization for Generative Design
,”
ASME J. Mech. Des.
,
143
(
12
), p.
124501
.
54.
Jacobsen
,
E. J.
,
Greve
,
R.
, and
Togelius
,
J.
,
2014
, “
Monte Mario: Platforming With MCTS
,”
GECCO 2014—Proceedings of 2014 Genetic and Evolutionary Computation Conference
,
Vancouver, Canada
,
July
, pp.
293
300
.
55.
Keller
,
T.
, and
Helmert
,
M.
,
2013
, “
Trial-Based Heuristic Tree Search for Finite Horizon MDPs
,”
ICAPS 2013—Proceedings of the 23rd International Conference on Automobile Plan. Schedule
,
Rome, Italy
,
June
, pp.
135
143
.
56.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
A Bradford Book
,
Cambridge, MA
.
57.
Puentes
,
L.
,
Raina
,
A.
,
Cagan
,
J.
, and
McComb
,
C.
,
2020
, “
Modeling a Strategic Human Engineering Design Process: Human-Inspired Heuristic Guidance Through Learned Visual Design Agents
,”
Proceedings of the Design Society: DESIGN Conference
,
Croatia (online)
,
June
, pp.
355
364
.
58.
McComb
,
C.
,
Cagan
,
J.
, and
Kotovsky
,
K.
,
2015
, “
Lifting the Veil: Drawing Insights About Design Teams From a Cognitively-Inspired Computational Model
,”
Des. Stud.
,
40
(
1
), pp.
119
142
.
59.
Wielinga
,
B.
, and
Schreiber
,
G.
,
1997
, “
Configuration-Design Problem Solving
,”
IEEE Expert. Syst. their Appl.
,
12
(
2
), pp.
49
56
.
60.
McComb
,
C.
,
Cagan
,
J.
, and
Kotovsky
,
K.
,
2015
, “
Rolling With the Punches: An Examination of Team Performance in a Design Task Subject to Drastic Changes
,”
Des. Stud.
,
36
(
C
), pp.
99
121
.
61.
Spillers
,
W. R.
, and
MacBain
,
K. M.
,
2009
, “Some Basic Optimization Problems,”
Structural Optimization
,
K. M.
MacBain
, and
W. R.
Spillers
, eds.,
Springer US
,
Boston, MA
, pp.
103
137
.
62.
Querin
,
O. M.
,
Victoria
,
M.
,
Alonso
,
C.
,
Ansola
,
R.
, and
Martí
,
P.
,
2017
, “Topology Optimization as a Digital Design Tool,”
Topology Design Methods for Structural Optimization
,
O. M.
Querin
,
M.
Victoria
,
C.
Alonso
,
R.
Ansola
, and
P.
Martí
, eds.,
Academic Press
,
Oxford
, pp.
93
111
.
63.
Yu
,
T.
,
Quillen
,
D.
,
He
,
Z.
,
Julian
,
R.
,
Narayan
,
A.
,
Shively
,
H.
,
Bellathur
,
A.
,
Hausman
,
K.
,
Finn
,
C.
, and
Levine
,
S.
,
2019
, “
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
,” CoRR, abs/1910.1.
64.
Liu
,
R.
,
Lehman
,
J.
,
Molino
,
P.
,
Such
,
F. P.
,
Frank
,
E.
,
Sergeev
,
A.
, and
Yosinski
,
J.
,
2018
, “
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
,”
Adv. Neural Inf. Process. Syst.
,
2018
, pp.
9605
9616
.
65.
Huang
,
S.
, and
Ontañón
,
S.
,
2020
, “
A Closer Look at Invalid Action Masking inPolicy Gradient Algorithms
,”
arXiv
. https://arxiv.org/abs/2006.14171
66.
Agarwal
,
R.
,
Schwarzer
,
M.
,
Castro
,
P. S.
,
Courville
,
A.
, and
Bellemare
,
M. G.
,
2021
, “
Deep Reinforcement Learning at the Edge of the Statistical Precipice
,”
Proceedings of the 35th Conference on Neural Information Processing Systems
,
Virtual only
,
December
, pp.
1
27
.
67.
Reddy
,
G.
, and
Cagan
,
J.
,
1994
, “
An Improved Shape Annealing Method for Truss Topology Generation
,”
Proceedings of ASME Design Engineering Technical Conference, Part F167892-1
,
Minneapolis, MN
,
September
, pp.
331
341
.
68.
Königseder
,
C.
, and
Shea
,
K.
,
2014
, “
Systematic Rule Analysis of Generative Design Grammars
,”
Artif. Intell. Eng. Des. Anal. Manuf. AIEDAM
,
28
(
3
), pp.
227
238
.
69.
Raina
,
A.
,
2022
,
Towards Deep Learning Guided Search Agents for Sequentially Generative Design Problems
,
Carnegie Mellon University ProQuest Dissertations Publishing
,
Pittsburgh, PA
.
70.
Hart
,
P. E.
,
Nilsson
,
N. J.
, and
Raphael
,
B.
,
1968
, “
A Formal Basis for the Heuristic Determination of Minimum Cost Paths
,”
IEEE Trans. Syst. Sci. Cybern.
,
4
(
2
), pp.
100
107
.
71.
Raina
,
A.
,
Cagan
,
J.
, and
McComb
,
C.
,
2022
, “
Self-Learning Design Agent (SLDA): Enabling Deep Learning and Tree Search in Complex Action Spaces
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
St. Louis, MO
,
August
.
You do not currently have access to this content.