This notebook contains material from CBE40455-2020; content is available on Github.
Up to this point we have been using Python generators and shared resources as the building blocks for simulations of complex systems. This can be effective, particularly if the individual agents do not require access to the internal state of other agents. But there are situations where the action of an agent depends on the state or properties of another agent in the simulation. For example, consider this discussion question from the Grocery store checkout example:
Suppose we were to change one or more of the lanes to a express lanes which handle only with a small number of items, say five or fewer. How would you expect this to change average waiting time? This is a form of prioritization ... are there other prioritizations that you might consider?
The customer action depends the item limit parameter associated with a checkout lane. This is a case where the action of one agent depends on a property of another. The shared resources builtin to the SimPy library provide some functionality in this regard, but how do add this to the simulations we write?
The good news is that Python offers a rich array of object oriented programming features well suited to this purpose. The SymPy documentation provides excellent examples of how to create Python objects for use in SymPy. The bad news is that object oriented programming in Python -- while straightforward compared to many other programming languages -- constitutes a steep learning curve for students unfamiliar with the core concepts.
Fortunately, since the introduction of Python 3.7 in 2018, the standard libraries for Python have included a simplified method for creating and using Python classes. Using dataclass, it easy to create objects for SymPy simulations that retain the benefits of object oriented programming without all of the coding overhead.
The purpose of this notebook is to introduce the use of dataclass
in creating SymPy simulations. To the best of the author's knowledge, this is a novel use of dataclass
and the only example of which the author is aware.
!pip install sympy
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
import simpy
import pandas as pd
from dataclasses import dataclass
import sys
print(sys.version)
Additional imports are from the dataclasses
library that has been part of the standard Python distribution since version 3.7. Here we import dataclass
and field
.
from dataclasses import dataclass, field
dataclass
¶Tutorials and additional documentation:
dataclass
¶A dataclass
defines a new class of Python objects. A dataclass
object takes care of several routine things that you would otherwise have to code, such as creating instances of an object, testing for equality, and other aspects.
As an example, the following cell shows how to define a dataclass corresponding to a hypothetical Student object. The Student object maintains data associated with instances of a student. The dataclass also defines a function associated with the object.
from dataclasses import dataclass
@dataclass
class Student():
name: str
graduation_class: int
dorm: str
def print_name(self):
print(f"{self.name} (Class of {self.graduation_class})")
Let's create an instance of the Student object.
sam = Student("Sam Jones", 2024, "Alumni")
Let's see how the print_name()
function works.
sam.print_name()
The next cell shows how to create a list of students, and how to iterate over a list of students.
# create a list of students
students = [
Student("Sam Jones", 2024, "Alumni"),
Student("Becky Smith", 2023, "Howard"),
]
# iterate over the list of students to print all of their names
for student in students:
student.print_name()
print(student.dorm)
Here are a few details you need to use dataclass
effectively:
class
statement is standard statement for creating a new class of Python objects. The preceding @dataclass
is a Python 'decorator'. Decorators are Python functions that modify the behavior of subsequent statements. In this case, the @dataclass
decorator modifies class
to provide a streamlined syntax for implementing classes.Student
is the class name.int
, float
, bool
, and str
. Use the keyword any
you don't know or can't specify a particular type. Type hints are actually used by type-checking tools and ignored by the python interpreter.self
.There are different ways of specifying the parameter values assigned to an instance of a dataclass. Here are three particular methods:
Parameter values can be specified when creating an instance of a dataclass. The parameter values can be specified by position or by name as shown below.
from dataclasses import dataclass
@dataclass
class Student():
name: str
graduation_year: int
dorm: str
def print_name(self):
print(f"{self.name} (Class of {self.graduation_year})")
sam = Student("Sam Jones", 2031, "Alumni")
sam.print_name()
gilda = Student(name="Gilda Radner", graduation_year=2030, dorm="Howard")
gilda.print_name()
Setting a default value for a parameter can save extra typing or coding. More importantly, setting default values makes it easier to maintain and adapt code for other applications, and is a convenient way to handle missing data.
There are two ways to set default parameter values. For str, int, float, bool, tuple (the immutable types in Python), a default value can be set using =
as shown in the next cell.
from dataclasses import dataclass
@dataclass
class Student():
name: str = None
graduation_year: int = None
dorm: str = None
def print_name(self):
print(f"{self.name} (Class of {self.graduation_year})")
jdoe = Student(name="John Doe", dorm="Alumni")
jdoe.print_name()
Default parameter values are restricted to 'immutable' types. This technical restriction eliminiates the error-prone practice of use mutable objects, such as lists, as defaults. The difficulty with setting defaults for mutable objects is that all instances of the dataclass share the same value. If one instance of the object changes that value, then all other instances are affected. This leads to unpredictable behavior, and is a particularly nasty bug to uncover and fix.
There are two ways to provide defaults for mutable parameters such as lists, sets, dictionaries, or arbitrary Python objects.
The more direct way is to specify a function for constucting the default parameter value using the field
statement with the default_factory
option. The default_factory is called when a new instance of the dataclass is created. The function must take no arguments and must return a value that will be assigned to the designated parameter. Here's an example.
from dataclasses import dataclass
@dataclass
class Student():
name: str = None
graduation_year: int = None
dorm: str = None
majors: list = field(default_factory=list)
def print_name(self):
print(f"{self.name} (Class of {self.graduation_year})")
def print_majors(self):
for n, major in enumerate(self.majors):
print(f" {n+1}. {major}")
jdoe = Student(name="John Doe", dorm="Alumni", majors=["Math", "Chemical Engineering"])
jdoe.print_name()
jdoe.print_majors()
Student().print_majors()
Frequently there are additional steps to complete when creating a new instance of a dataclass. For that purpose, a dataclass may contain an
optional function with the special name __post_init__(self)
. If present, that function is run automatically following the creation of a new instance. This feature will be demonstrated in following reimplementation of the grocery store checkout operation.
dataclass
with Simpy¶To demonstrate the use of classes in SimPy simulations, let's begin with a simple model of a clock using generators.
import simpy
def clock(id="", t_step=1.0):
while True:
print(id, env.now)
yield env.timeout(t_step)
env = simpy.Environment()
env.process(clock("A"))
env.process(clock("B", 1.5))
env.run(until=5.0)
As a first step, we rewrite the generator as a Python dataclass named Clock
. The parameters are given default values, and the generator is incorporated within the Clock object. Note the use of self
to refer to parameters specific to an instance of the class.
import simpy
from dataclasses import dataclass
@dataclass
class Clock():
id: str = ""
t_step: float = 1.0
def process(self):
while True:
print(self.id, env.now)
yield env.timeout(self.t_step)
env = simpy.Environment()
env.process(Clock("A").process())
env.process(Clock("B", 1.5).process())
env.run(until=5)
Our definition of clock requires the simulation environment to have a specific name env
, and assumes env is a global variable. That's generally not a good coding practice because it imposes an assumption on any user of the class, and exposes the internal coding of the class. A much better practice is to use class parameters to pass this data through a well defined interface to the class.
import simpy
from dataclasses import dataclass
@dataclass
class Clock():
env: simpy.Environment
id: str = ""
t_step: float = 1.0
def process(self):
while True:
print(self.id, self.env.now)
yield self.env.timeout(self.t_step)
env = simpy.Environment()
env.process(Clock(env, "A").process())
env.process(Clock(env, "B", 1.5).process())
env.run(until=10)
import simpy
from dataclasses import dataclass
@dataclass
class Clock():
env: simpy.Environment
id: str = ""
t_step: float = 1.0
def __post_init__(self):
self.env.process(self.process())
def process(self):
while True:
print(self.id, self.env.now)
yield self.env.timeout(self.t_step)
env = simpy.Environment()
Clock(env, "A")
Clock(env, "B", 1.5)
env.run(until=5)
Let's review our model for the grocery store checkout operations. There are multiple checkout lanes, each with potentially different characteristics. With generators we were able to implement differences in the time required to scan items. But another parameter, a limit on number of items that could be checked out in a lane, required a new global list. The reason was the need to access that parameter, something that a generator doesn't allow. This is where classes become important building blocks in creating more complex simulations.
Our new strategy will be encapsulate the generator inside of a dataclass object. Here's what we'll ask each class definition to do:
from dataclasses import dataclass
# create simulation models
@dataclass
class Checkout():
env: simpy.Environment
lane: simpy.Store = None
t_item: float = 1/10
item_limit: int = 25
t_payment: float = 2.0
def __post_init__(self):
self.lane = simpy.Store(self.env)
self.env.process(self.process())
def process(self):
while True:
customer_id, cart, enter_time = yield self.lane.get()
wait_time = env.now - enter_time
yield env.timeout(self.t_payment + cart*self.t_item)
customer_log.append([customer_id, cart, enter_time, wait_time, env.now])
@dataclass
class CustomerGenerator():
env: simpy.Environment
rate: float = 1.0
customer_id: int = 1
def __post_init__(self):
self.env.process(self.process())
def process(self):
while True:
yield env.timeout(random.expovariate(self.rate))
cart = random.randint(1, 25)
available_checkouts = [checkout for checkout in checkouts if cart <= checkout.item_limit]
checkout = min(available_checkouts, key=lambda checkout: len(checkout.lane.items))
yield checkout.lane.put([self.customer_id, cart, env.now])
self.customer_id += 1
def lane_logger(t_sample=0.1):
while True:
lane_log.append([env.now] + [len(checkout.lane.items) for checkout in checkouts])
yield env.timeout(t_sample)
# create simulation environment
env = simpy.Environment()
# create simulation objects (agents)
CustomerGenerator(env)
checkouts = [
Checkout(env, t_item=1/5, item_limit=25),
Checkout(env, t_item=1/5, item_limit=25),
Checkout(env, item_limit=5),
Checkout(env),
Checkout(env),
]
env.process(lane_logger())
# run process
customer_log = []
lane_log = []
env.run(until=600)
def visualize():
# extract lane data
lane_df = pd.DataFrame(lane_log, columns = ["time"] + [f"lane {n}" for n in range(0, len(checkouts))])
lane_df = lane_df.set_index("time")
customer_df = pd.DataFrame(customer_log, columns = ["customer id", "cart items", "enter", "wait", "leave"])
customer_df["elapsed"] = customer_df["leave"] - customer_df["enter"]
# compute kpi's
print(f"Average waiting time = {customer_df['wait'].mean():5.2f} minutes")
print(f"\nAverage lane queue \n{lane_df.mean()}")
print(f"\nOverall aaverage lane queue \n{lane_df.mean().mean():5.4f}")
# plot results
fig, ax = plt.subplots(3, 1, figsize=(12, 7))
ax[0].plot(lane_df)
ax[0].set_xlabel("time / min")
ax[0].set_title("length of checkout lanes")
ax[0].legend(lane_df.columns)
ax[1].bar(customer_df["customer id"], customer_df["wait"])
ax[1].set_xlabel("customer id")
ax[1].set_ylabel("minutes")
ax[1].set_title("customer waiting time")
ax[2].bar(customer_df["customer id"], customer_df["elapsed"])
ax[2].set_xlabel("customer id")
ax[2].set_ylabel("minutes")
ax[2].set_title("total elapsed time")
plt.tight_layout()
visualize()
from dataclasses import dataclass
# create simulation models
@dataclass
class Checkout():
env: simpy.Environment
lane: simpy.Store = None
t_item: float = 1/10
item_limit: int = 25
t_payment: float = 2.0
def __post_init__(self):
self.lane = simpy.Store(self.env)
self.env.process(self.process())
def process(self):
while True:
customer_id, cart, enter_time = yield self.lane.get()
wait_time = env.now - enter_time
yield env.timeout(self.t_payment + cart*self.t_item)
customer_log.append([customer_id, cart, enter_time, wait_time, env.now])
@dataclass
class CustomerGenerator():
env: simpy.Environment
rate: float = 1.0
customer_id: int = 1
def __post_init__(self):
self.env.process(self.process())
def process(self):
while True:
yield env.timeout(random.expovariate(self.rate))
Customer(self.env, self.customer_id)
self.customer_id += 1
@dataclass
class Customer():
env: simpy.Environment
id: int = 0
def __post_init__(self):
self.cart = random.randint(1, 25)
self.env.process(self.process())
def process(self):
available_checkouts = [checkout for checkout in checkouts if self.cart <= checkout.item_limit]
checkout = min(available_checkouts, key=lambda checkout: len(checkout.lane.items))
yield checkout.lane.put([self.id, self.cart, env.now])
def lane_logger(t_sample=0.1):
while True:
lane_log.append([env.now] + [len(checkout.lane.items) for checkout in checkouts])
yield env.timeout(t_sample)
# create simulation environment
env = simpy.Environment()
# create simulation objects (agents)
CustomerGenerator(env)
checkouts = [
Checkout(env, t_item=1/5, item_limit=25),
Checkout(env, t_item=1/5, item_limit=25),
Checkout(env, item_limit=5),
Checkout(env),
Checkout(env),
]
env.process(lane_logger())
# run process
customer_log = []
lane_log = []
env.run(until=600)
visualize()
from dataclasses import dataclass, field
import pandas as pd
# create simulation models
@dataclass
class Checkout():
lane: simpy.Store
t_item: float = 1/10
item_limit: int = 25
def process(self):
while True:
customer_id, cart, enter_time = yield self.lane.get()
wait_time = env.now - enter_time
yield env.timeout(t_payment + cart*self.t_item)
customer_log.append([customer_id, cart, enter_time, wait_time, env.now])
@dataclass
class CustomerGenerator():
rate: float = 1.0
customer_id: int = 1
def process(self):
while True:
yield env.timeout(random.expovariate(self.rate))
cart = random.randint(1, 25)
available_checkouts = [checkout for checkout in checkouts if cart <= checkout.item_limit]
checkout = min(available_checkouts, key=lambda checkout: len(checkout.lane.items))
yield checkout.lane.put([self.customer_id, cart, env.now])
self.customer_id += 1
@dataclass
class LaneLogger():
lane_log: list = field(default_factory=list) # this creates a variable that can be modified
t_sample: float = 0.1
lane_df: pd.DataFrame = field(default_factory=pd.DataFrame)
def process(self):
while True:
self.lane_log.append([env.now] + [len(checkout.lane.items) for checkout in checkouts])
yield env.timeout(self.t_sample)
def report(self):
self.lane_df = pd.DataFrame(self.lane_log, columns = ["time"] + [f"lane {n}" for n in range(0, N)])
self.lane_df = self.lane_df.set_index("time")
print(f"\nAverage lane queue \n{self.lane_df.mean()}")
print(f"\nOverall average lane queue \n{self.lane_df.mean().mean():5.4f}")
def plot(self):
self.lane_df = pd.DataFrame(self.lane_log, columns = ["time"] + [f"lane {n}" for n in range(0, N)])
self.lane_df = self.lane_df.set_index("time")
fig, ax = plt.subplots(1, 1, figsize=(12, 3))
ax.plot(self.lane_df)
ax.set_xlabel("time / min")
ax.set_title("length of checkout lanes")
ax.legend(self.lane_df.columns)
# create simulation environment
env = simpy.Environment()
# create simulation objects (agents)
customer_generator = CustomerGenerator()
checkouts = [
Checkout(simpy.Store(env), t_item=1/5),
Checkout(simpy.Store(env), t_item=1/5),
Checkout(simpy.Store(env), item_limit=5),
Checkout(simpy.Store(env)),
Checkout(simpy.Store(env)),
]
lane_logger = LaneLogger()
# register agents
env.process(customer_generator.process())
for checkout in checkouts:
env.process(checkout.process())
env.process(lane_logger.process())
# run process
env.run(until=600)
# plot results
lane_logger.report()
lane_logger.plot()