Mimesis is a Python random information generator built upon Faker. It can generate many, many more random attributes than Faker:
address code development food locale payment text
binaryfile cryptographic file hardware numeric person transport
choice datetime finance internet path science
It also supports many locales, can created structured data, allows for custom functions and integrates well with Pandas and is more performant than Faker.
A field is used to generate a single value for a specific field.
A fieldset is used to generate a set of values:
from mimesis import Fieldset, Locale
fieldset = Fieldset(locale=Locale.EN)
fieldset("name", i=3)
['Basil', 'Carlee', 'Sheryll']
A fieldset is used to generate fake Pandas dataframes:
import pandas as pd
from mimesis import Fieldset
from mimesis.locales import Locale
fs = Fieldset(locale=Locale.EN, i=5)
df = pd.DataFrame.from_dict({
"ID": fs("increment"),
"Name": fs("person.full_name"),
"Email": fs("email"),
"Phone": fs("telephone", mask="+1 (###) #5#-7#9#"),
})
print(df)
will produce:
ID Name Email Phone
1 Jamal Woodard ford1925@live.com +1 (202) 752-7396
2 Loma Farley seq1926@live.com +1 (762) 655-7893
3 Kiersten Barrera relationship1991@duck.com +1 (588) 956-7099
4 Jesus Frederick troubleshooting1901@gmail.com +1 (514) 255-7091
5 Blondell Bolton strongly2081@example.com +1 (327) 952-7799
We can use annotations to add handlers which are custom functions that can return data :
field = Field()
fs = Fieldset(locale=Locale.EN, i=10)
@field.handle("generate_n__word_sentence")
@fs.handle("generate_n_word_sentence")
def generate_n_word_sentence(random, min_words=1, max_words=1, **kwargs) -> Any:
n = random.randint(min_words, max_words)
return ' '.join(field("text.words", quantity=n))
The following shows how to generate a complex structure that as a whole can be updated according to a probability. First a utility function to generate a structure using a mapping of generators. Each element can either be a field generator or a passed in function:
def generate_struct(generators: Dict[str, Any], **kwargs) -> Dict[str, Any]:
ret = {}
for key in generators:
if "fs" in generators[key]:
v = field(generators[key]["fs"], **generators[key].get("kwargs", {}))
if "lambda" in generators[key]:
v = generators[key]["lambda"](v)
else:
if "lambda" in generators[key]:
v = generators[key]["lambda"]()
ret[key] = v
return ret
The following returns an existing structure or, with probability p, updates the structure:
slowly_changing_map = {}
@fs.handle("maybe_change")
def maybe_change(random, sc_key="main", generator="text.word", generators=None, p=0.5, **kwargs) -> Any:
if not sc_key in slowly_changing_map:
slowly_changing_map[sc_key] = generate_struct(generators, **kwargs) if generators else field(generator, **kwargs)
elif random.random() <= p:
slowly_changing_map[sc_key] = generate_struct(generators, **kwargs) if generators else field(generator, **kwargs)
return slowly_changing_map[sc_key]
which can be used as:
orders_data =
convert_to_dict_of_lists(
fs("maybe_change", sc_key='rt_orders', p=0.5,
generators={
"ID": { "fs": "generate_n_character_alphanumeric",
"kwargs": {"length":7, "num_digits": 0, "order": NumberOrder.LAST}},
"OptIn": {"fs":"boolean"},
"Amount": {"fs": "price", "kwargs": {"minimum": 10.0, "maximum": 300.0}},
"AdjustmentAmount": { "lambda": lambda x : round(x,2) if x else None,
"fs" :"float_number",
"kwargs": { "start": 0.0,
"end": 15.0,
"key" : maybe(None, probability=.75)}},
"Date": { "fs": "formatted_date_time",
"kwargs": { "start_date": datetime.date.today(),
"end_date": datetime.date.today()}},
"OrderID": { "fs": "maybe_increment",
"kwargs": {"incrementor": "orders", "start": 5,}},
"Method": { "fs": "random_category",
"kwargs": { "categories": ship_methods}},
"Address": { "fs": "address.street_name"},
"Number": { "lambda": lambda : phone_numbers.pop(0)},
"City": { "fs": "address.city"},
"Type": { "fs": "random_category",
"kwargs": { "categories": types,
"key": maybe(None, probability=.95)}},
}
))
We use the maybe_change fieldset handler and pass in a dictionary of generators, some of which are themselves fieldset (fs) generators where we pass in the name of the generator (generate_n_character_alphanumeric, price, address.city, etc), and some are functions {lambda x: round (x, 2)} which can be either standalone (lambda : customer_mobile_phone_numbers_copy.pop(0)) or are applied to the result of the passed in fieldset generator using the 'fs' parameter.
Mimesis is a powerful and fast fake data generator that allow customisation. I have used it to generate complex fake data for use cases that require consistent, slowly changing random data. We have used it to model our production data in a development environment so ensuring no leakage out of Prod of any data, especially PII data.
More information can be found at Mimesis: Fake Data Generator